18

I am researching how python implements dictionaries. One of the equations in the python dictionary implementation relates the pseudo random probing for an empty dictionary slot using the equation

j = ((j*5) + 1) % 2**i

which is explained here.

I have read this question, How are Python's Built In Dictionaries Implemented?, and basically understand how dictionaries are implemented.

What I don't understand is why/how the equation:

j = ((j*5) + 1) % 2**i   

cycles through all the remainders of 2**i. For instance, if i = 3 for a total starting size of 8. j goes through the cycle:

0
1
6
7
4
5
2
3
0

if the starting size is 16, it would go through the cycle:

0 1 6 15 12 13 2 11 8 9 14 7 4 5 10 3 0

This is very useful for probing all the slots in the dictionary. But why does it work ? Why does j = ((j*5)+1) work but not j = ((j*6)+1) or j = ((j*3)+1) both of which get stuck in smaller cycles.

I am hoping to get a more intuitive understanding of this than the equation just works and that's why they used it.

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
Fairly Nerdy
  • 863
  • 1
  • 8
  • 9

1 Answers1

13

This is the same principle that pseudo-random number generators use, as Jasper hinted at, namely linear congruential generators. A linear congruential generator is a sequence that follows the relationship X_(n+1) = (a * X_n + c) mod m. From the wiki page,

The period of a general LCG is at most m, and for some choices of factor a much less than that. The LCG will have a full period for all seed values if and only if:

  1. m and c are relatively prime.
  2. a - 1 is divisible by all prime factors of m.
  3. a - 1 is divisible by 4 if m is divisible by 4.

It's clear to see that 5 is the smallest a to satisfy these requirements, namely

  1. 2^i and 1 are relatively prime.
  2. 4 is divisible by 2.
  3. 4 is divisible by 4.

Also interestingly, 5 is not the only number that satisfies these conditions. 9 will also work. Taking m to be 16, using j=(9*j+1)%16 yields

0 1 10 11 4 5 14 15 8 9 2 3 12 13 6 7

The proof for these three conditions can be found in the original Hull-Dobell paper on page 5, along with a bunch of other PRNG-related theorems that also may be of interest.

kevmo314
  • 4,223
  • 4
  • 32
  • 43
  • Don't want to be pedantic but I think it's *pseudo random* not *random*;) – styvane May 22 '16 at 20:33
  • 1
    It can't be "if and only if" in the statement of the theorem; e.g. taking `c = 0` and `a` to be a primitive root modulo 2^i will result in a full period, but `0` isn't coprime with `m`. – Sven Marnach May 22 '16 at 20:46
  • @SvenMarnach Yeah, I was just wondering about that as well. It looks like the original paper suggests sufficiency, however the "if and only if" comes from the Knuth citation. Knuth also enforces a nonzero `c`, so I think the wiki summarization is just incomplete for this special case. I'll fix the wiki page... – kevmo314 May 22 '16 at 20:58
  • 1
    @kevmo314 Thanks for the response. I accepted this as the right answer, but I'm not sure I completely understand every situation. For instance, with a = 7, c = 1, m =13 I get a period length of 12 instead of 13, but that appears to satisfy all 3 requirements. On the other hand, with a = 7, c=1, m = 18 it works perfectly and I get a period length of 18 as expected – Fairly Nerdy May 22 '16 at 22:43
  • 1
    I suppose having m = 13 in my previous comment might not satisfy condition 2 since 13 has no prime factors (since it is itself prime) – Fairly Nerdy May 22 '16 at 22:52
  • 2
    Knuth btw. also mentions that *“when m is the product of distinct primes, only a = 1 will produce the full period”*, so that should give you an idea on what not to choose for `m` :) – poke May 22 '16 at 22:53
  • 1
    @FairlyNerdy Your followup is correct, 13's prime factors include 13, which 6 is not divisible by. If `m` is composed of distinct prime factors, then the only solution will be a = 1, as @poke notes. – kevmo314 May 22 '16 at 23:47