前言
snowflake算法是Twitter技术团队在2010年开源的分布式ID的生成算法,后续美团和百度都相应的根据该算法进行了改进,并且开源了其分布式ID生成算法。本文将详细介绍snowflake算法的数据结构以及其工作原理
数据结构
snowflake算法生成的分布式ID是一个一个64bit大小的整数,其结构如下:
1位,占位符不用,我们知道一般第一位表示正负数的符号,所以固定位为0
41位,用来记录时间的时间戳(毫秒)
41位可以表示的最大的数字为2 41 2^{41} 2 4 1 ,其可以使用的最长的时间为
2 41 ÷ ( 1000 × 60 × 60 × 24 × 356 ) = 71 2^{41}\div(1000\times60\times60\times24\times356)=71 2 4 1 ÷ ( 1 0 0 0 × 6 0 × 6 0 × 2 4 × 3 5 6 ) = 7 1 年
10位,工作机器ID,可以部署在2 10 = 1024 2^{10}=1024 2 1 0 = 1 0 2 4 个结点,其中5位为workerId,5位为datacenterId
12位,序列号,用来记录同毫秒内产生的ID,可以表示2 12 = 4096 2^{12}=4096 2 1 2 = 4 0 9 6 个序列号
代码详解
Twitter的官方是用Scala写的,在这里我改成了Java版
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 package 数组;public class IdWorker { private long workerId; private long datacenterId; private long twepoch = 1288834974657L ; private long workerIdBits = 5L ; private long datacenterIdBits = 5L ; private long maxWorkerId = -1L ^ (-1L << workerIdBits); private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); private long sequenceBits = 12L ; private long workerIdShift = sequenceBits; private long datacenterIdShift = sequenceBits + workerIdBits; private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; private long sequenceMask = -1L ^ (-1L << sequenceBits); private long lastTimestamp = -1L ; private long sequence = 0L ; public IdWorker (Long workerId, Long datacenterId) { if (workerId > maxWorkerId || workerId < 0 ) { System.out.println("worker Id can't be greater than %d or less than 0" ); } if (datacenterId > maxDatacenterId || datacenterId < 0 ) { System.out.println("datacenter Id can't be greater than %d or less than 0" ); } this .workerId = workerId; this .datacenterId = datacenterId; } private Long getWorkerId () { return workerId; } private Long getDataCenterId () { return datacenterId; } private Long getTimestamp () { return System.currentTimeMillis(); } public synchronized Long getId () { Long timestamp = getTimeGen(); if (timestamp < lastTimestamp) { System.out.println("error,clock is moving backwards. Rejecting requests until " ); } if (lastTimestamp == timestamp) { sequence = (sequence + 1 ) & sequenceMask; if (sequence == 0 ) { timestamp = getNextMills(lastTimestamp); } } else { sequence = 0L ; } lastTimestamp = timestamp; return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift) | (workerId << workerIdShift) | sequence; } protected Long getNextMills (Long lastTimestamp) { Long timestamp = getTimeGen(); while (timestamp <= lastTimestamp) { timestamp = getTimeGen(); } return timestamp; } protected Long getTimeGen () { return System.currentTimeMillis(); } }
该算法的整体思路如下:
获取ID的时候,先获取当前的时间戳,和上一个时间戳比较,如果小,则产生了时钟回拨,错误。
如果时间戳相等则序列号加1和序列号最大值求余,当序列号为0的时候则阻塞到下个毫秒
如果时间戳大于上一个时间戳,则序列号赋值为0
通过位运算生成分布式ID结果