6

I have an array of given size. I want to traverse it in pseudorandom order, keeping array intact and visiting each element once. It will be best if current state can be stored in a few integers.

I know you can't have full randomness without storing full array, but I don't need the order to be really random. I need it to be perceived as random by user. The solution should use sub-linear space.

One possible suggestion - using large prime number - is given here. The problem with this solution is that there is an obvious fixed step (taken module array size). I would prefer a solution which is not so obviously non-random. Is there a better solution?

Michal
  • 565
  • 5
  • 15
  • 1
    Are you happy with visiting a given element more than once or not? i.e. are you looking for a random permutation of the elements or for a random sequence of elements? The question you're quoting is concerned with the first of these problems (also known as shuffling). – Walter Mar 04 '15 at 11:53
  • 1
    Many (most ?) pseudo-random number generators can be defined by an algorithm and an integer or two to describe their current state. Given those factors their future behaviour is entirely deterministic though apparently random -- that's what *pseudo* means here. What are you looking for that isn't a PRNG ? – High Performance Mark Mar 04 '15 at 11:59
  • Obviously, I need to avoid repetitions (otherwise I can just jump to random element very time). I edited the question, adding that information. In fact, desired solution should store current state in a way that allows to traverse N elements in N steps from any state. – Michal Mar 04 '15 at 12:10
  • *Obviously, I need to avoid repetitions* No, it wasn't obvious at all prior to your edit, and it has a material impact on potential solutions. – High Performance Mark Mar 04 '15 at 12:15
  • TBH, I think the original was pretty clear already, `I want to traverse it is pseudorandom order`. Traverse really implies each element is visited once, but it could be just me and my bad english. – amit Mar 04 '15 at 12:17

6 Answers6

2

The idea of a random generator that simulates a shuffle is good if you can get one whose maximum period you can control.

A Linear Congruential Generator calculates a random number with the formula:

x[i + 1] = (a * x[i] + c) % m;

The maximum period is m and it is achieved when the following properties hold:

  • The parameters c and m are relatively prime.
  • For every prime number r dividing m, a - 1 is a multiple of r.
  • If m is a multiple of 4 then also a - 1 is multiple of 4.

My first darft involved making m the next multiple of 4 after the array length and then finding suitable a and c values. This was (a) a lot of work and (b) yielded very obvious results sometimes.

I've rethought this approach. We can make m the smallest power of two that the array length will fit in. The only prime factor of m is then 2, which will make every odd number relatively prime to it. With the exception of 1 and 2, m will be divisible by 4, which means that we must make a - 1 a multiple of 4.

Having a greater m than the array length means that we must discard all values that are illegal array indices. This will happen at most every other turn and should be negligible.

The following code yields pseudo random numbers with a period of exaclty m. I've avoided trivial values for a and c and on my (not too numerous) spot cheks, the results looked okay. At least there was no obvious cycling pattern.

So:

class RandomIndexer
{
public:
    RandomIndexer(size_t length) : len(length)
    {
        m = 8;
        while (m < length) m <<= 1;

        c = m / 6 + uniform(5 * m / 6);
        c |= 1;

        a = m / 12 * uniform(m / 6);
        a = 4*a + 1;
        x = uniform(m);                      
    }

    size_t next()
    {
        do { x = (a*x + c) % m; } while (x >= len);

        return x;
    }

private:
    static size_t uniform(size_t m)
    {
        double p = std::rand() / (1.0 + RAND_MAX);

        return static_cast<int>(m * p);
    }

    size_t len;
    size_t x;
    size_t a;
    size_t c;
    size_t m;
};

You can then use the generator like this:

std::vector<int> list;
for (size_t i = 0; i < 3; i++) list.push_back(i);

RandomIndexer ix(list.size());
for (size_t i = 0; i < list.size(); i++) {
    std::cout << list[ix.next()]<< std::endl;
}

I am aware that this still isn't a great random number generator, but it is reasonably fast, doesn't require a copy of the array and seems to work okay.

If the approach of picking a and c randomly yields bad results, it might be a good idea to restrict the generator to some powers of two and to hard-code literature values that have proven to be good.

M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • Are you sure ```while (p*p < m)``` condition is enough? What if there is a larger prime dividing m? For example, 956 = 4 * 239 and your algorithm (if I checked correctly) yields a=5, which is wrong. – Michal Mar 04 '15 at 14:16
  • Another problem with this solution is that it doesn't "look" random - it is easy to see that there is a fixed step between two subsequent items. – Michal Mar 04 '15 at 14:45
  • You are right. We must find all factors. In cases where _m_ is a product of 4 and a prime, _a_ is effectively 1. (It's _m_ + 1, but modulo arithmetic means it is as good as 1.) I've corrected the code. – M Oehm Mar 04 '15 at 14:52
  • The LCG isn't a good gerenator. The case _a_ = 1 makes it a very bad one. In this case, _a_ should probably multiplicated with another (random?) number to make it less obvious. Finding good sets of parameters requires some study. (The pattern is, of course, more ovious if the array is just an enumeration. Since the point of the exercise is to index an array, the structure of the array might hide the regularity.) – M Oehm Mar 04 '15 at 14:57
  • Okay, change of plan. This whole business of making _m_ as close to the array length as possible is nonsense. We can skip the whole prime-finding business and just use the next power of two. I've updated the code and run some checks and the results look okay to me. (There's probably still a lot of room for improvement, but I consider it a working solution.) – M Oehm Mar 04 '15 at 19:17
2

How about this algorithm?

To pseudo-pseudo randomly traverse an array of size n.

  1. Create a small array of size k
  2. Use the large prime number method to fill the small array, i = 0
  3. Randomly remove a position using a RNG from the small array, i += 1
  4. if i < n - k then add a new position using the large prime number method
  5. if i < n goto 3.

the higher k is the more randomness you get. This approach will allow you to delay generating numbers from the prime number method.

A similar approach can be done to generate a number earlier than expected in the sequence by creating another array, "skip-list". Randomly pick items later in the sequence, use them to traverse the next position, and then add them to the skip-list. When they naturally arrive they are searched for in the skip-list and suppressed and then removed from the skip-list at which point you can randomly add another item to the skip-list.

ryanpattison
  • 6,151
  • 1
  • 21
  • 28
  • I ended up with a variation on skiplist theme. All items are generated by Large Prime Number method, but every time I need to pick a number, I first add random number of items to the skiplist (up to a small fixed size). Then, I either choose randomly from the skiplist or return next number calculated by large prime number method (this is also decided randomly). It is very easy to implement and the numbers look really random. – Michal Mar 05 '15 at 11:22
0

As pointed out by others, you can create a sort of "flight plan" upfront by shuffling an array of array indices and then follow it. This violates the "it will be best if current state can be stored in a few integers" constraint but does it really matter? Are there tight performance constraints? After all, I believe that if you don't accept repetitions, than you need to store the items you already visited somewhere or somehow.

Alternatively, you can opt for an intrusive solution and store a bool inside each element of the array, telling you whether the element was already selected or not. This can be done in an almost clean way by employing inheritance (multiple as needed).
Many problems come with this solution, e.g. thread safety, and of course it violates the "keep the array intact" constraint.

gd1
  • 11,300
  • 7
  • 49
  • 88
  • And again, the OP asked no storing an array in similar size, or in other words - he asks for a solution in o(n) space [small o here] . – amit Mar 04 '15 at 12:18
  • I understood that, and in fact I pointed this out very clearly. I am proposing partial, alternative solutions that relax some constraints, and can be possibly a good trade-off. For example, the intrusive solution may save a few bytes per element if compared to shuffling an array of indices. Question is: have you read the answer? ;) – gd1 Mar 04 '15 at 12:20
  • Perhaps this could be combined with use of a random number generator whose seed can be forced. The program would pick one random seed. Each time the shuffled data is needed it could be reproduced from the original array and the seed. – Patricia Shanahan Mar 04 '15 at 12:38
  • @gd1 I was assuming the random number generator would be used to drive the obvious choice of a [Fisher-Yates shuffle](http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle). – Patricia Shanahan Mar 04 '15 at 12:50
  • @PatriciaShanahan : the OP doesn't want to modify the input array and ideally she/he wants not to use extra space. That's what makes this thing difficult, of course anybody can call std::random_shuffle, and that's gonna be Fisher-Yates I guess – gd1 Mar 04 '15 at 13:13
  • @gd1 Of course I read the answer, it offers O(n) space solutions, which is quite obviously not what the OP is asking. – amit Mar 04 '15 at 14:09
  • @amit : I didn't claim I provided the solution. I guess SO is also about alternatives, trade-offs, and trying to point the OP to a direction that is not necessarily the one she/he expects. I provided a point of view, I formalised the answer that Kamil just suggested with a link, and I expressed my humble opinion that what the OP asks is actually impossible if she/he does not relax some of the constraints. I don't think that here someone asks for an algorithm with a given complexity, and either you respond to that requirement by spitting out the code/solution/link, or your answer is crap. – gd1 Mar 04 '15 at 14:17
  • I am not convinced it cannot be done, as "real" randomness is not required. And if it cannot be done, then random_shuffle is the simplest solution for me. – Michal Mar 04 '15 at 14:43
  • 1
    I see your point but it is actually very difficult to reach consensus on your "unreal" randomness, as it is a hazy concept that is quite difficult to formalise. Even if we come up with a solution that is "somewhat" random, you may say it is not random enough, or not random in the pseudo-sense you meant it to be, and when we point out that true randomness is unachievable, you may always say that you don't need it, because you're fine with "some" randomness of yours. I'll give it some thought :) – gd1 Mar 04 '15 at 16:01
0

Quadratic residues which you have mentioned ("using a large prime") are well-known, will work, and guarantee iterating each and every element exactly once (if that is required, but it seems that's not strictly the case?). Unluckily they are not "very random looking", and there are a few other requirements to the modulo in addition to being prime for it to work.
There is a page on Jeff Preshing's site which describes the technique in detail and suggests to feed the output of the residue generator into the generator again with a fixed offset.

However, since you said that you merely need "perceived as random by user", it seems that you might be able to do with feeding a hash function (say, cityhash or siphash) with consecutive integers. The output will be a "random" integer, and at least so far there will be a strict 1:1 mapping (since there are a lot more possible hash values than there are inputs).

Now the problem is that your array is most likely not that large, so you need to somehow reduce the range of these generated indices without generating duplicates (which is tough).

The obvious solution (taking the modulo) will not work, as it pretty much guarantees that you get a lot of duplicates.

Using a bitmask to limit the range to the next greater power of two should work without introducing bias, and discarding indices that are out of bounds (generating a new index) should work as well. Note that this needs non-deterministic time -- but the combination of these two should work reasonably well (a couple of tries at most) on the average.

Otherwise, the only solution that "really works" is shuffling an array of indices as pointed out by Kamil Kilolajczyk (though you don't want that).

Damon
  • 67,688
  • 20
  • 135
  • 185
0

Here is a java solution, which can be easily converted to C++ and similar to M Oehm's solution above, albeit with a different way of choosing LCG parameters.

import java.util.Enumeration;
import java.util.Random;

public class RandomPermuteIterator implements Enumeration<Long> {
    int c = 1013904223, a = 1664525;
    long seed, N, m, next;
    boolean hasNext = true;

    public RandomPermuteIterator(long N) throws Exception {
        if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
        this.N = N;
        m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
        next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
    }

    public static void main(String[] args) throws Exception {
        RandomPermuteIterator r = new RandomPermuteIterator(100);
        while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
        //output:50 52 3 6 45 40 26 49 92 11 80 2 4 19 86 61 65 44 27 62 5 32 82 9 84 35 38 77 72 7 ...
    }

    @Override
    public boolean hasMoreElements() {
        return hasNext;
    }

    @Override
    public Long nextElement() {
        next = (a * next + c) % m;
        while (next >= N) next = (a * next + c) % m;
        if (next == seed) hasNext = false;
        return  next;
    }
}
Matthew Franglen
  • 4,441
  • 22
  • 32
aykutfirat
  • 469
  • 5
  • 4
-1

maybe you could use this one: http://www.cplusplus.com/reference/algorithm/random_shuffle/ ?

  • The OP asks without manipulating the array or storing a new one. – amit Mar 04 '15 at 12:03
  • but using this, you can create new simple array of indices 0..length-1 (not a clone of array of objects) , random_shuffle them and get objects using these shuffled indices – Kamil Mikolajczyk Mar 04 '15 at 12:07
  • 1
    My condition was to avoid extra data of similar size. Obviously, if we allow a copy of array, random_shuffle will do. – Michal Mar 04 '15 at 12:10
  • yeah, but I'm suggesting creating temp array of ints which is less size than complicated objects... or maybe this array is very big and even temp array of zillion ints in unacceptable, then ok, I'm wrong :) – Kamil Mikolajczyk Mar 04 '15 at 12:14