2

The Fisher–Yates algorithm generates unbiased random permutations of a finite sequence. The running time is proportional to the number elements being shuffled.

I want to shuffle a few non-zero elements with a large number of zero elements.

Implementing the Fisher–Yates algorithm with a list would lead to the shuffling process taking too long and requiring too much storage. Most steps in the Fisher--Yates algorithm would simply switch the position of duplicate zero elements.

Does there exists a random shuffle (or alternative) algorithm that:

  1. Leads to unbiased permutations
  2. Does not require the shuffling and storing of all duplicate elements
  • There are approximately 1e4 to 1e6 zero elements per non-zero element. – Lodin Ellingsen Oct 26 '21 at 12:06
  • 2
    With that kind of distribution you could distribute the non-zero values at random indexes. In the rare case of collision, just retry. – trincot Oct 26 '21 at 12:09
  • @trincot A better estimate of the number of non-zero elements is perhaps 1e2 to 1e6. I have more than enough non-zero elements that collision are an issue. In fact, your suggestion of removing samples with collisions is the algorithm I am trying to find an alternative to. As the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) illustrates, collision become almost inevitable for systems with large number of elements (even if the probability of a single collision remains small). – Lodin Ellingsen Oct 26 '21 at 12:21
  • What is the array size? – trincot Oct 26 '21 at 12:35
  • Currently the array size is 1e4 to 1e8, but I would prefer to not to use an array of this size. – Lodin Ellingsen Oct 26 '21 at 14:37

1 Answers1

2

Since a Fisher-Yates shuffle produces a random permutation, its inverse is also a random permutation:

  1. For i=1 to n-1:
    1. choose a random number j in [0,i]
    2. swap elements i and j

In this algorithm, though, if you have m non-zero elements, and you start with all of them at the end, then the first n-m iterations are guaranteed to be swapping zeros, so you can just skip those.

Use a hash map instead of an array if you want to avoid storing all the zero elements.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87