I'm in the middle of implementing my own dht for internal cluster. Since it will be used in file-sharing program like bittorrent, "Mainline DHT" was the first thing I was look at. After that I found "entangled" (python, dht using twisted matrix), congress (python, dht using pyev + libev) and of course original "kademlia".
They have different approaches on organizing k-buckets:
1) congress, kademlia use fixed 160 buckets in range 2*i <= (difference for each id from us) < 2*(i+1), for 0 <= i < 160.
2) mainline DHT and entangled use dynamic buckets. On start they have just 1 bucket covering whole space. After it will be filled with 8 alive nodes, bucket will be splitted to 2 new. But ONLY if our own id inside that bucket. If it is not -- bucket will be never splitted. So, soon we will have 160 closest to us buckets and few other.
Both variants are good enough. But I have found HUGE difference in logic which detects belongs some id to some bucket or not. And this is my question.
congress and kademlia treat bucket bundaries as "minimum distance from us" and "maximum distance from us". So, our own ID will be ALWAYS in bucket0. Maximum 2 other ids in bucket1 (because it covers 2*1 <= x < 2*2 distances) will be ALWAYS closest to us. So my brain does not breaks, coz everything OK.
But if you look into Mainline DHT or entangled, you will see what bucket bundaries treated as absolute node id bundaries, not xor distance! So in theoretically full table ids 0,1,2,3,4,5,6,7 will be in 1 bucket.
So. Why some implementations treat bucket boundaries as "max/min distance from us", while others as "max/min 160bit integer value"??