Mark M cells on a NxN board randomly with equal probability

Question

An interview question:

Given a NxN board with all cells set to 0, mark M (M < NxN) cells to 1. The M cells should be chosen from all cells with equal probability.

E.g. Mark 30 cells in a 10x10 board, then the probability for a cell to be chosen is 0.3.

My idea is to iterate all cells and on each cell compute a random number in range [1-100], mark the cell to 1 if the number is less than or equal to 30.

The interviewer is not impressed by this solution. Any good idea? (You can use any language)

The solution you proposed will mark approximately 30 cells. However, the interviewer probably wanted you to mark exactly 30 cells. You could put the numbers from 1 to 100 into an array and shuffle the array. Then use the array as your source of random numbers, and your method marks exactly 30 cells. — user3386109, Nov 21 '14 at 06:27
@candu I don't think this is a duplicate because of the 2-d mapping aspect, but the link is very useful. — pjs, Nov 22 '14 at 00:34
@pjs The 2D structure doesn't enter into the solution at all. — David Eisenstat, Nov 22 '14 at 04:29
@DavidEisenstat is correct. The problem is *exactly* the same: choose `M` different integers at random from the range `[0, N*N)`. Mapping these integers onto specific cells is trivial, and has nothing to do with the underlying algorithm. — candu, Nov 22 '14 at 20:35
Thanks DavidEisenstat for the link and pjs btilly for answer. — Lee, Nov 24 '14 at 03:15

pjs · Accepted Answer · 2014-11-22T00:43:03.813

Put 70 zeros (NxN - M) and 30 ones (M) into a vector. Shuffle the vector. Iterate through and map each index k to 2-d indices via i = k / 10 and j = k % 10 for your example (use N as the divisor more generally).

ADDENDUM

After checking out @candu's link, I decided to give that approach a try. Here's an implementation in Ruby:

require 'set'

# implementation of Floyd's uniform subset algorithm for
# values in the range [0,n).
def generateMfromN(m, n)
  s = Set.new
  ((n-m)...n).each {|j| s.add?(rand(j+1)) || s.add(j)}
  s.to_a
end

#initialize a 10x10 array of zeros
a = Array.new(10)
10.times {|i| a[i] = Array.new(10,0)}

# create an array of 10 random indices between 0 and 99,
# map each index to 2-d indices, and set the corresponing
# element to 1.
generateMfromN(10,100).each {|index| a[index/10][index%10] = 1}

# show the results
a.each {|v| puts v.to_s}

This produces results such as...

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

and appears to require only O(M) work for Floyd's algorithm, since on each of M iterations an element always gets added to the set.

If M is bigger than N*N/2, initialize the array with 1's and randomize placement of zeros instead, as suggested by @btilly.

Thanks. I was not aware of "shuffle" at that point. This should be a good solution. — Lee, Nov 24 '14 at 03:11

score 1 · Answer 2 · answered Nov 21 '14 at 07:38

1

This can be done in expected running time O(m).

First let's deal with the case where we need at most half the board. So m <= n*n/2. For this case we can keep choosing random points and changing their values, throwing away and we chose before, until we have m of them. The probability of throwing away the next random choice is never more than half, so the number of random choices needed is at worst 2 m = O(m).

In the case where we need more than half the board, it takes time O(m) to flip every cell to 1, and then we use the previous solution to find n*n - m cells to turn back to 0.

answered Nov 21 '14 at 07:38

btilly

43,296
3
59
88

Thanks. Actually when I failed to figure out a more correct solution the interviewer gave me hint that similar to your solution. – Lee Nov 24 '14 at 03:13
@Lee I am not surprised. The solution that I offered does not have guaranteed performance, but has the best possible average performance. By contrast the one you accepted has running time `O(n)` and in the case where `m = O(n)` has significantly worse constants. Were I the interviewer and I was given that solution, I would have pushed for a better one. – btilly Nov 24 '14 at 17:54
You are kidding yourself that this is better than a shuffle. In fact it is worse. With a shuffle, you only need to perform m swaps. And then you can stop. Your approach has a minimum of m samples, but possible more. – David Heffernan Nov 02 '15 at 23:19
@DavidHeffernan In order to perform a shuffle you need to initialize a vector of numbers to be shuffled. Assuming that the matrix in question is a bit-vector, that data structure will be several times larger than the matrix. If m is much smaller than n, creating it takes several times longer than this algorithm's average time to finish. That said, shuffling just enough to produce the first m values always requires exactly m operations. And making random-seeming choices is expensive. Which one is actually best will depend on circumstances. – btilly Nov 04 '15 at 00:32

Mark M cells on a NxN board randomly with equal probability

2 Answers2