1

I have been trying, with no success, to perform a partial fisher-yates shuffle directly for the first N elements of a 2D array. I know I could transform the 2D array into a 1D array, perform the shuffle and then turn it back, but if possible I would like to avoid this.

The problem in my implementation is, I think although I have not proved it, that the last elements of each row of my array are not correctly shuffled. In fact, supposingI have a m*n array,of which the first N are equal to 1 and the rest m*n - N are equal to 0. When I perform my shuffle of the first N elements of the array, about 67% of the time the elements at the end of each row: array[i][end] are equal to 1, which seems to me to be too many times.

Here is my implementation along with a driver code that can be run to show what the problem is:

void swap(int **a, int i, int j, int iNew, int jNew) 
{
  int temp = a[i][j]; 
  a[i][j] = a[iNew][jNew]; 
  a[iNew][jNew] = temp;
}

void fisher_yates_shuffle(int n, int nbLines, int nbColumns, int **a) 
{
    for (int i = 0; i < nbLines; i++)
    {
        for (int j = 0; j < nbColumns; j++)
        {
            swap(a, i, j, i+rand()%(nbLines-i), j+rand()%(nbColumns -j)); // swap element with random later element

            n--;

            if(n<=0)
                break;
        }
        if(n<=0)
            break;
    }
}

int main(int argc, char const *argv[])
{
    int count1 = 0, count2 = 0, count3 = 0, count4 = 0, N = 10, k = 100000;
    srand(time(NULL));

    //Short example of a 5 by 5 array, with N = 10 first elements equal to 1
    //that need to to be shuffled among all of its elements.
    while(k > 0)
    {
        //allocate
        N = 10;
        int **a = (int**)malloc(sizeof(int*) * 5);
        for(int i = 0; i < 5; i++)
            a[i] = (int*)malloc(sizeof(int) * 5);

        //initialize
        for(int i = 0; i < 5; i++)
        {
            for(int j = 0; j < 5; j++)
            {
                if(N > 0)
                    a[i][j] = 1;
                else
                    a[i][j] = 0;
                N--;
            }
        }

        //shuffle
        fisher_yates_shuffle(10, 5, 5, a);

        //count how many times the last element of each row is equal to 1.
        if(a[1][4] == 1)
            count1++;
        if(a[2][4] == 1)
            count2++;
        if(a[3][4] == 1)
            count3++;
        if(a[4][4] == 1)
            count4++;

        //destroy memory allocated.
        for(int i = 0; i < 5; i++)
            free(a[i]);
        free(a);

        k--;
    }

    //print approximate results.
    printf("%d %d %d %d\n", (count1 * 100)/100000, (count2 * 100)/100000, (count3 * 100)/100000, (count4 * 100/100000));

    return 0;
}

I know it doesn't look very good and there must be a more efficient way of doing it. Maybe there is a different, equally efficient algorithm to shuffle the first N elements of a 2D array like that?

Desperados
  • 434
  • 5
  • 13
  • 2
    Apparently `a` is an array of pointers, but you should clarify that by [edit]ing the question. Also, it would help to see what the `swap` function does. In summary, this question needs a [mcve]. – user3386109 Nov 15 '18 at 23:34
  • I didn't think it made any difference for my question. As far as the way to approach the algorithm is concerned. – Desperados Nov 15 '18 at 23:36
  • 1
    Role of `int n` unclear. I'd expect `n` to be reset each `for (int i = 0; i < nbLines; i++)` iteration. – chux - Reinstate Monica Nov 15 '18 at 23:45
  • 2
    Given the implementation in your snippet, you could just use a plain array of n*m elements. BTW, have you properly used `srand`, somewhere in your code? – Bob__ Nov 15 '18 at 23:48
  • 1
    Instead of rand(), which produces randoms in range 0..RAND_MAX, and modulo operator, which does not provide even distribution, you should use better random function and better distribution, like those from `` header, but for C instead for C++. – Dialecticus Nov 15 '18 at 23:50
  • I editted the answer to show a quick a example of the problem. @Dialecticus I don't really see how rand could be the problem, in my opinion the choice of indexes that can be swapped is the issue. Since when I reach the end of a row, this element can only be swapped with the other elements that are in the same column (the last one). – Desperados Nov 15 '18 at 23:59
  • 1
    @Dialecticus Given that the number of lines is 5 and that `RAND_MAX` is likely something like `2147483647`, I'd bet quite a sum that the modulo operator doesn't introduce any measurable problems with the distribution. – Andrew Henle Nov 16 '18 at 00:06
  • @AndrewHenle oh? try [`rand() % 7`](https://stackoverflow.com/q/7866754/1553090) then. – paddy Nov 16 '18 at 00:09
  • 1
    Suggestion: put a `printf` on the first line of the `swap` function and print the four indexes that were passed to the function. – user3386109 Nov 16 '18 at 00:22
  • 1
    @paddy What's your point? A poor `rand()` implementation produces poor results? Note that even your example doesn't effect the *distribution* of results after the modulo operator. – Andrew Henle Nov 16 '18 at 00:32
  • 1
    @user3386109 Forgot to add another break outside of the first loop, this is why the indexes are not correct, I will edit it now and you could check again, the problem is that when j = 4 jNew = 4 necessarily.This kind of shuffle usually only ignores the items already shuffled in each iteration, but the way I implement for a 2D array, it ignores the items shuffled plus the whole column of all the items before the current one. This is the problem. – Desperados Nov 16 '18 at 00:35
  • If you take out the 2 `if(n<=0) break;` from `fisher_yates_shuffle()`, does it work as desired? The point being: use of `n` is distorting the shuffle. – chux - Reinstate Monica Nov 16 '18 at 01:32
  • @chux Not really, it just continues by unnecessarily shuffling random elements. Maybe it decreases somewhat the probability of an end row element being equal to 1 if said end row element is located after the Nth element of the array. However end row elements that are before that continue to have the same behavior. – Desperados Nov 16 '18 at 01:38

1 Answers1

3

Expanding my previous comment, you could shuffle your "array" in a single loop:

void fisher_yates_shuffle(int n, int nbLines, int nbColumns, int **a) 
{
    for (int i = 0; i < n; i++)
    {
        int j = i + rand() % (nbLines * nbColumns - i);
        swap(a, i / nbColumns, i % nbColumns, j / nbColumns, j % nbColumns); 
    }
}

Note that there are far better ways to implement and pass around a matrix in C and that your swap function should return void, not int.

Bob__
  • 12,361
  • 3
  • 28
  • 42
  • This works perfectly, random positions get values of 1 with more or less the same probability. My "array" shuffle works now, thanks. I would also appreciate it if you could explain in more detail your comment about how I could pass my matrix into my function more efficiently – Desperados Nov 16 '18 at 00:46
  • 2
    @Desperados See e.g. https://stackoverflow.com/questions/42094465/correctly-allocating-multi-dimensional-arrays It's a long Q&A, but it's worth reading. – Bob__ Nov 16 '18 at 00:49
  • 1
    I think indexes (i and j) have to be divides by nbColumns so that the algorithm may be applicable to 'arrays' that are not square (n by n). I think my observation is correct, maybe you want to consider editing your answer or explaining to me if I am wrong. – Desperados Nov 21 '18 at 04:45
  • @Desperados Yes, you are correct. Good catch, thanks. – Bob__ Nov 21 '18 at 07:12