Make 2N random selections from size N list whilst avoiding sequential duplicates

Question

Given a list of N objects, how do I make 2N selections in a random order such that no object is chosen twice in succession? (All N objects must be chosen twice.)

EDIT start: My goal is to present a list of words, once on the left and once on the right. No two consecutive trials are to have the same word.

Really I am asking if there is a known technique for doing this, or can someone think of a way to do it without the fiddle. EDIT end

My attempt is shown below, and may be useful as a template for testing.

The premise is to generate randomised indices and then check for repeats. When a repetition is found, swap the first repetition with the value before. Finally, if the lower two elements are the same, swap the 2^nd and 3^rd.

<script type="text/javascript">
    function init() {
        for (rep = 0; rep < 100000; rep++) {
            var indices = [0,0, 1,1, 2,2];
            shuffle(indices);

            for (var i = indices.length - 1; i > 1; i--)
                if (indices[i - 1] == indices[i]) {
                    var tmp = indices[i - 1];
                    indices[i - 1] = indices[i - 2];
                    indices[i - 2] = tmp;
                }

            if (indices[0] == indices[1]) {
                var tmp = indices[1];
                indices[1] = indices[2];
                indices[2] = tmp;
            }

            // test
            for (i = indices.length - 1; i > 1; i--)
                if (indices[i - 1] == indices[i])
                    rep = 1E8;  // fail
        }

        // **EDIT2:** BAD BAD BAD mistake in the check code!!!  Sorry!
        dbg.innerHTML = (rep >= 1E8) ? "Oh bother." : "OK";
    }

    function randomInt(max) { return Math.floor(Math.random() * max); }
    function shuffle(o) { for (var j, x, i = o.length; i; j = randomInt(i), x = o[--i], o[i] = o[j], o[j] = x); return o; }
</script>

<div>
    <span id="dbg" />
</div>

The failure of the method to deal with the lowest two elements is common to the alternative method of selecting from a reducing list. One potentially ends up with two identical elements remaining.

(Note, the method presented is not recommended for use since may not have a uniform random distribution.)

EDIT start: The experiment is supposed to present words in a "random" order. My method below has a 'fiddle' with the last two numbers just seems wrong - at the very least nasty.

(I agree that the overall sequence is not truly random. I am not a mathematician, just trying to code the experiment.) EDIT end

(Shuffle code taken from Jeff's answer.)

If the order is truly random, then it makes no sense to ask that "no object is chosen twice in succession" — Mitch Wheat, Feb 17 '16 at 03:28
I've edited the question with the goal and the motivation for asking. As I say, what I really want to know if there is a better way. — Andrew Malcolm, Feb 17 '16 at 05:32
"One potentially ends up with two identical elements remaining". How so? — n. m. could be an AI, Feb 17 '16 at 05:47
I imagine the source list initialised with [0,0, 1,1, 2,2]. Select elements and _move_ them to the destination, such that the one selected doesn't equal the previously chosen. A valid such selection is 0,1,0,1, leaving two 2's in the source. The next choice would be a 2, with only another 2 in the source list. — Andrew Malcolm, Feb 17 '16 at 06:30
The "fiddle" with the last two elements could be avoided if you treated the buffer as circular; then the element "before" the first element is the last element, and it is no longer an exceptional case. However, I think @btilly's answer produce a less biased distribution, although I lack a proof. — rici, Feb 17 '16 at 15:25
@rici Brilliant answer! Can you explain why it does not potentially cause a match of the last item with the second to last? The ever so slight performance penalty is a small price to pay for removing a fiddle, _IMHO_. If you wish please submit it as an answer. — Andrew Malcolm, Feb 18 '16 at 01:33
@rici ... I eventually figured out if the bottom two match, and there are only two of each item, swapping the first with the last works as the only other match is in the 2nd position. Doh! I honestly never thought of this - the result of the 'twice' being a special case. — Andrew Malcolm, Feb 19 '16 at 21:39

btilly · Answer 1 · 2016-02-17T22:55:42.080

1

Make the list of 2n elements. Walk through it. If you find a repeat at positions i-1, i then swap the ith element with whatever is in a random position other than i, i-1, i-2.

(Sigh, examining the proof found a mistake. Some swaps can't be done without creating an early pairing. You can fix that by only swapping down the road. But now there is 1/n that you'll wind up with a final pair that can't be fixed. But it will still be close to random.)

~~If your random number generator is truly random, this will give a perfectly random distribution.~~

Here is a sketch of a proof by induction on i that after the ith step all distributions with no repeats up the position pair (i-1, i) are equally likely. This result with i = 2n-1 would show that the algorithm works.

Let p(i) be the probability of a possible distribution showing up after the ith step. This will be well-defined for i if the result holds for i.

Next it is trivially true for i = 0 because we start with a random shuffle. So p(0) is well-defined.

After the first step, each possible distribution could have been arrived at by having been the result of the shuffle with probability p(0), or by having arrived at a duplication in (0, 1) which was resolved with a swap to this distribution. The second possibility occurred with probability p(0)/(n-2). Therefore the result is true for 1 and p(1) = p(0) + p(0)/(n-2).

For i in 2, ..., 2n-1 you apply the same argument as for i=1 except that p(i) = p(i-1) + p(i-1)/(n-3).

edited Feb 17 '16 at 22:55

answered Feb 17 '16 at 08:58

btilly

43,296
3
59
88

... "other than `i-2, i-1` **`or i`**". Btw, can you prove that this is unbiased? – rici Feb 17 '16 at 15:27
@rici Oops, or `i`. I fixed and added a sketch of the proof. – btilly Feb 17 '16 at 18:17
If you find a repeat at `i, i+1`, can't you just swap `i` with `i-1`? And if `i==0`, swap the first element with the last. The thought here is that if `i == i+1`, then swapping `i` and `i-1` resolves the sequential equality condition, and can't possibly create another one because there are only two of any item in the list. – Jim Mischel Feb 17 '16 at 18:51
@JimMischel That will produce an answer, but does not produce all possible answers with equal likelihood. In particular answers with gaps of 2 will occur more often than they should. – btilly Feb 17 '16 at 19:38
@rici I was wrong. It isn't perfect. It is close. – btilly Feb 17 '16 at 22:56
@btilly that was my suspicion. I was just about to test it. – rici Feb 18 '16 at 00:08
@btilly I'm sorry but don't understand the idea. Swapping with 'other than `i-2,i-1,or i`' will allow repeats in the already processed list, and only allowing choices below `i-2`, doesn't work, _in my implementation_. – Andrew Malcolm Feb 18 '16 at 01:05
Swapping with other than those 3 will never cause a repeat because you're taking away a possible repeat and trading it for an element that is definitely not a repeat (since it's partner is at `i-1`). The other version would be to swap `i` for something larger. This will work fine until you reach the end of the list. The odds of this are low, and now you can swap the one at the end for anything below `i-2` for an *almost* unbiased implementation. – btilly Feb 18 '16 at 01:18
The value swapped in may match the next value in the list. E.g. start with `[2,1,2,0,0,1]`, when `i=4` swap with element at position 1 to get `[2,0,2,0,1,1]`. If we swap the `i-1`th value and check against `i-1,i,i+1` it works. – Andrew Malcolm Feb 18 '16 at 03:01
@btilly **SORRY** if you used my code to check! I stuffed up BIG time! Honest mistake, but I am terribly sorry if you did use my code. – Andrew Malcolm Feb 18 '16 at 03:06
@btilly: It's not *that* close. For N==6, there are 2,631,600 possible outcomes, so the expected probability for each one is 3.8E-07. Exhaustive search (*not* Monte Carlo) found a range of probabilities from 2.489E-07 to 4.215E-07. I suppose for larger N, the range of probabilities shrinks. – rici Feb 19 '16 at 00:40

rici · Accepted Answer · 2016-02-19T05:57:03.457

After spending way too much time thinking about this problem [Note 1], I couldn't come up with an unbiased random generation algorithm other than the standard rejection algorithm. (Generate an unbiased sample of shuffle sequences of {0, 0, 1, 1,..., N-1, N-1} and reject the sequence if it contains two consecutive elements.)

Fortunately, the rejection algorithm in this case is not awful, because more than one-third of the total sequences qualify, and you can usually reject them before they are fully generated. So you would expect to generate on average fewer than two rejected sequences before finding one which works, and the possibility of early rejection means the expected time to generate each sequence of length 2N might be less than 4N, which means that it may well be faster than generating a sequence and doing another pass over it to "fix" the repetitions.

The sketch for the assertion that more than one-third of sequences qualify:

The number of 2N sequences (including repetitions) is (2N)!/2^N, based on the standard combinatoric counting formula. I'll call that A(n). We can easily see that A(n) is n(2n−1)A(n−1). (The first term is n rather than 2n because the denominator in A(n) has one more 2.)
The number of 2N sequences without repetitions (a(n), an equally arbitrary name) is given by the recursion a(n) = n(2n−1)a(n−1) + n(2n−1)a(n−2) [Note 2]. This recursion is very similar, but it has a relatively small second term.
It's clear than a(n) < A(n) because it is counting a subset of the same sequences. Also, A(n)/A(n−1) = n(2n−1) while a(n)/a(n−1) > n(2n−1). So a(n) grows slightly more rapidly than A(n) but it can never catch up. Since A(2) is 90 and a(2) is 30, we can assert that 1 < A(n)/a(n) ≤ 3 for all n ≥ 2. (The numbers get big very rapidly. According to OEIS, a(16) is 1453437879238150456164433920000, so A(16)/a(16) is 2.762455825573288. My guess is that the ratio will converge to some value not too far from that, but I did nothing to validate that guess. If I were to go way out on a limb, I might speculate that the limit of the ratio will turn out to be e, but the similarity might be entirely coincidental.)

Here's some not-very-tested javascript code which randomly shuffles a vector to produce a repetition-free result. (You need to initialize the vector once, but you'll get just as random a result starting with the previous shuffle as reinitializing, so I'd strongly suggest just initializing once.)

function reshuffle(indices) {
  for (var i = 0, len = indices.length; i < len;) {
    var j = i + Math.floor(Math.random() * (len - i));
    var tmp = indices[j]; indices[j] = indices[i]; indices[i] = tmp;
    if (i > 0 && indices[i - 1] == indices[i]) i = 0;
    else ++i;
  }
  return indices;
}

On a quick test, with the addition of a loop counter, the loop executed an average of 22.9 times to find a random 12-element vector, and 346 times to find a random 200-element vector.

Notes

Well, a few hours. But I do have other things to do.
I eventually managed to prove this, but I'm not going to write the proof here because of the lack of LaTeX; the sequence is A114938 in the OEIS and you'll find the same recursion there.

Jim Mischel · Answer 3 · 2016-02-17T18:59:01.103

One way would be to do a Fischer-Yates shuffle of N items, and return them sequentially. Then shuffle the array again and return them sequentially. The only possibility of returning the same item twice in succession is if the last item from the first shuffle is the first item in the second shuffle. That's easily overcome:

array = shuffle(array);
for each item in array
    return item;

lastItem = array[last];
array = shuffle(array);
if (array[0] == lastItem)
    swap(array[0], array[1]);
for each item in array
    return item;

If you can't shuffle the array, you could easily build an index array and shuffle it, instead.

It's unclear to me if this will solve the problem in the way you want. From your description, it seems like it's okay to have a list that contains duplicates, as long as the duplicates aren't sequential. My solution doesn't produce a duplicate until all other items are produced.

That will produce distributions that meet the requirement, but cannot produce most of them. — btilly, Feb 17 '16 at 19:40
@Jim Mischel This in essence divides the experiment into two blocks, _which is desirable in some occasions_, but unfortunately not this one. — Andrew Malcolm, Feb 18 '16 at 01:09

Make 2N random selections from size N list whilst avoiding sequential duplicates

3 Answers3

Notes

Linked