4

I have a set of exactly 16,704,200 unique objects. I need to construct a function f such that:

  • f(x) returns a seemingly random object from the list (but always the same object for a given value of x)

  • f(0) through f(16704199) returns the complete set of objects (no duplicates) in that seemingly random order

  • f doesn't need to store a list of 16,704,200 ordered integers

I've looked at several SO answers about using pseudo-random number generators or linear feedback shift registers to generate sequences of random numbers. The disadvantage there would be the only way to find the value of f(7000) would be to initialize the register, loop 7000 times, and then return the number. (Unless I stored the entire pre-generated sequence, which as stated above I'd prefer not to do.)

Are there any algorithms better suited to finding the 7000th (xth) entry in a randomized sequence?

Elliot Nelson
  • 11,371
  • 3
  • 30
  • 44
  • In essence you are looking for a *reversible* map (0..16_704_199) to itself, where the relationships are "random". Obviously true random is out, and most PRNGs are out if you don't want to store or iterate. You might be able to construct a reversible map, but the randomness is likely to be simple or weak - would that be acceptable? – Neil Slater May 08 '14 at 11:23
  • @NeilSlater Yes, in my situation, weak randomness is totally fine, as long as any given sequence of `f` -- say, any given set of 100 sequential `x` -- distributes somewhat uniformly across the list. – Elliot Nelson May 08 '14 at 11:26
  • 1
    You might be able to construct something using "old school" RNG: http://en.wikipedia.org/wiki/Linear_congruential_generator - also see http://stackoverflow.com/questions/2911432/reversible-pseudo-random-sequence-generator – Neil Slater May 08 '14 at 11:28
  • I wonder there is such algorithm that satisfy point 2 and 3, if the order of objects in the original set cannot be changed. – Arie Xiao May 08 '14 at 12:07

2 Answers2

4

You can use a Linear Congruential Generator - this type of PRNG is considered very crude nowadays for any purpose requiring statistical randomness, but does have an advantage in your case that it can be made to repeat a specific sequence of known size. It also happens to be reversible, and this is related to your requirement of 1-to-1 mapping between sequence id and selected index id.

First, pick a couple of prime numbers, somewhere between 60% and 80% of your total size N.

N = 16_704_200
A =  9_227_917
C = 11_979_739

You can use the Prime module to find your numbers. You can even select them using a PRNG, and only store the prime numbers that you need.

Now you have these values, you can implement the LCG algorithm, which is your desired f(x):

def lcg x
  ( A * x + C ) % N
end

A quick test:

lcg( 0 )
# => 11979739

lcg( 12345 )
# => 7971104

(0..9).map { |x| lcg( x) }
 # => [ 11979739, 4503456, 13731373, 6255090, 15483007,
 #      8006724, 530441, 9758358, 2282075, 11509992 ]

. . . well it might be random, and if you feed back the output as next input parameter then you have an "old school" (and very low quality) PRNG. But you can just use it for index_id = lcg( sequence_id ) to fetch your objects in a random-looking sequence.

Does it map the whole set of input values to the same set of output values:

(0...N).map { |x| lcg( x ) }.uniq.count
# => 16704200

Yes!


Although you don't need it, the algorithm can be reversed. Here's how to do it:

The tricky bit is figuring out the multiplicative inverse of A. Here is an example of how to do that I found.

AINVERSE = 9257653
# Test it:
( A * AINVERSE ) % N 
# => 1

Now you have these values, you can implement the LCG algorithm forwards and backwards:

def lcg_fwd x
  ( A * x + C ) % N
end

def lcg_rev x
  ( AINVERSE * ( x - C ) ) % N
end

Test it:

lcg_fwd( 0 )
# => 11979739
lcg_rev( 11979739 )
# => 0

lcg_fwd( 12345 )
# => 7971104
lcg_rev( 7971104 )
# => 12345
Neil Slater
  • 26,512
  • 6
  • 76
  • 94
  • 1
    Thanks for your help, Neil! I'm marking this answer correct, because it (and your comments) led me down the right path. I'm actually going with an even simpler solution: `f(x) => (x * 9_227_917 + 11_979_739) % 16_704_200`. I know this is extremely weak randomness, but it satisfies all of my requirements and is indexable (instead of forward/backward). – Elliot Nelson May 08 '14 at 12:25
  • @Elliot Nelson: That is exactly what I meant. It becomes a (weak) PRNG for `x` if you do `x = lcg_fwd( x )` repeatedly . . . but otherwise `lcg_fwd(x)` *is* the `f(x)` you were looking for. Only reason to reverse it is if you end up knowing the target you picked, but not the index position you are meant to be in. The reason for even mentioning the reverse algorithm is that it is an implied property of the sequence you need. – Neil Slater May 08 '14 at 12:28
0

Perhaps a pre-seeded Random object might do the trick?

prng1 = Random.new(1234)
prng1.seed       #=> 1234
prng1.rand(100)  #=> 47
prng1.rand(99)   #=> 83

prng2 = Random.new(prng1.seed)
prng2.rand(100)  #=> 47
prng2.rand(99)   #=> 83

http://www.ruby-doc.org/core-2.1.1/Random.html

If you pick values large enough, you'll get unique numbers:

(1..1_000_000).map {|i| prng1.rand(1_000_000_000_000+i)}.uniq.size
=> 1000000
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154