It said:
We start with some definitions. For a k-bucket covering the distance range 2i,2i+1 , define the index of the bucket to be i. Define the depth, h, of a node to be 160 − i, where i is the smallest index of a non-empty bucket. Define node y’s bucket height in node x to be the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket. Because node IDs are randomly chosen, it follows that highly non-uniform distributions are unlikely. Thus with overwhelming probability the height of a any given node will be within a constant of log n for a system with n nodes. Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.
I can understand the definition of bucket height, but I don't know why we need that definition, and I don't understand the last sentence of the paragraph.
Updates: I also think that the paper has a typo: the bucket height should be the index of the bucket containing y minus the index of x’s least significant "NON-"empty bucket. Am I wrong?