2

I have a piece of code where I have to look in a bunch of vectors for the first element which satisfies a given predicate. I have to do this multiple times, and as there are multiple elements which satisfy my predicate, I'd like to somewhat randomize the order in which I check, so to give everybody a fair chance of being discovered.

At the same time, I'd like this to be as performant as possible, so I really don't want to shuffle/allocate or anything of the sort.

This requirements have sent me into a spiral of random number generation, linear congruential generators, LFSRs, Xorshifts, criptography, but I'm having a hard time figuring a good solution out.

I realize that picking a permutation truly randomly is impossible. What I'm looking for is a generator which I can pass

  • N
  • a seed
  • some number of random bits (in the form of number parameters generated from a separate PRNG)

and it will cycle through one of the permutations of N elements (picked pseudo-randomly).

This answer provides what I think could be a good starting point; a generator for 16 bits in the form of

P(x) = ((x ^ 0xf) ^ (x << 2) + 3) & 0xf 

Which one can iterate over discarding any number > N. I like this since it does not require me to find a prime number, only the next highest power of 2 which is fast with bit-fiddling. I have played with this and I have a generator of the form

P(x) = ((x ^ (next_pow2 - 1)) ^ (x << z) + y) & (next_pow2 - 1)

Picking z and y randomly gives me different permutations. However, I don't understand exactly how this works, whether it can be improved, and in what range y and z should be to be as fair as possible (since changing parameters will result in duplicating some permutations).

Could anyone point me out whether there's a better solution, and what to read exactly to learn more? This field looks terribly complicated for a novice, and I don't know where to start.

Svalorzen
  • 5,353
  • 3
  • 30
  • 54
  • 2
    If the only reason you don't want to shuffle is a concern about performance, I urge you to rethink. Shuffling is very cheap (one swap per element selected). It's only a problem if you need to preserve the original order and you don't have space for a copy. – rici Mar 30 '19 at 18:10
  • @rici I am leaving shuffling as a last ditch resort; I do have some vectors where the order is important, and I really wanted to avoid the extra allocations required to shuffle the indices separately. Currently >35% of my running time depends on allocations/deallocations so I'm trying to save there as much as possible. – Svalorzen Mar 31 '19 at 10:37
  • That seems excessive, so perhaps you need to confront the existing problem by allocating storage more efficiently. But please note that the time taken to allocate storage is not related to the amount of storage in a single allocation; it is related to the number of allocations (and other factors). So allocating twice as much memory in a single allocation does not have a significant cost in execution time, which is why I said that copying is a problem only if you don't have enough space. – rici Mar 31 '19 at 20:42
  • How large is your `N`? – Severin Pappadeux Apr 01 '19 at 00:41
  • @SeverinPappadeux Sometimes very short, sometimes hundreds, sometimes tens of thousands (depends on the problem). – Svalorzen Apr 01 '19 at 12:29
  • @rici I know; I'm working to optimize several things at once, and this is one of the problems I'm facing. – Svalorzen Apr 01 '19 at 12:31

0 Answers0