0

An interview question:

Given a NxN board with all cells set to 0, mark M (M < NxN) cells to 1. The M cells should be chosen from all cells with equal probability.

E.g. Mark 30 cells in a 10x10 board, then the probability for a cell to be chosen is 0.3.

My idea is to iterate all cells and on each cell compute a random number in range [1-100], mark the cell to 1 if the number is less than or equal to 30.

The interviewer is not impressed by this solution. Any good idea? (You can use any language)

Lee
  • 535
  • 5
  • 19
  • 1
    The solution you proposed will mark approximately 30 cells. However, the interviewer probably wanted you to mark exactly 30 cells. You could put the numbers from 1 to 100 into an array and shuffle the array. Then use the array as your source of random numbers, and your method marks exactly 30 cells. – user3386109 Nov 21 '14 at 06:27
  • @candu I don't think this is a duplicate because of the 2-d mapping aspect, but the link is very useful. – pjs Nov 22 '14 at 00:34
  • 1
    @pjs The 2D structure doesn't enter into the solution at all. – David Eisenstat Nov 22 '14 at 04:29
  • @DavidEisenstat I disagree. – pjs Nov 22 '14 at 05:47
  • @DavidEisenstat is correct. The problem is *exactly* the same: choose `M` different integers at random from the range `[0, N*N)`. Mapping these integers onto specific cells is trivial, and has nothing to do with the underlying algorithm. – candu Nov 22 '14 at 20:35
  • Thanks DavidEisenstat for the link and pjs btilly for answer. – Lee Nov 24 '14 at 03:15

2 Answers2

3

Put 70 zeros (NxN - M) and 30 ones (M) into a vector. Shuffle the vector. Iterate through and map each index k to 2-d indices via i = k / 10 and j = k % 10 for your example (use N as the divisor more generally).

ADDENDUM

After checking out @candu's link, I decided to give that approach a try. Here's an implementation in Ruby:

require 'set'

# implementation of Floyd's uniform subset algorithm for
# values in the range [0,n).
def generateMfromN(m, n)
  s = Set.new
  ((n-m)...n).each {|j| s.add?(rand(j+1)) || s.add(j)}
  s.to_a
end

#initialize a 10x10 array of zeros
a = Array.new(10)
10.times {|i| a[i] = Array.new(10,0)}

# create an array of 10 random indices between 0 and 99,
# map each index to 2-d indices, and set the corresponing
# element to 1.
generateMfromN(10,100).each {|index| a[index/10][index%10] = 1}

# show the results
a.each {|v| puts v.to_s}

This produces results such as...

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

and appears to require only O(M) work for Floyd's algorithm, since on each of M iterations an element always gets added to the set.

If M is bigger than N*N/2, initialize the array with 1's and randomize placement of zeros instead, as suggested by @btilly.

pjs
  • 18,696
  • 4
  • 27
  • 56
  • Thanks. I was not aware of "shuffle" at that point. This should be a good solution. – Lee Nov 24 '14 at 03:11
1

This can be done in expected running time O(m).

First let's deal with the case where we need at most half the board. So m <= n*n/2. For this case we can keep choosing random points and changing their values, throwing away and we chose before, until we have m of them. The probability of throwing away the next random choice is never more than half, so the number of random choices needed is at worst 2 m = O(m).

In the case where we need more than half the board, it takes time O(m) to flip every cell to 1, and then we use the previous solution to find n*n - m cells to turn back to 0.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • Thanks. Actually when I failed to figure out a more correct solution the interviewer gave me hint that similar to your solution. – Lee Nov 24 '14 at 03:13
  • @Lee I am not surprised. The solution that I offered does not have guaranteed performance, but has the best possible average performance. By contrast the one you accepted has running time `O(n)` and in the case where `m = O(n)` has significantly worse constants. Were I the interviewer and I was given that solution, I would have pushed for a better one. – btilly Nov 24 '14 at 17:54
  • You are kidding yourself that this is better than a shuffle. In fact it is worse. With a shuffle, you only need to perform m swaps. And then you can stop. Your approach has a minimum of m samples, but possible more. – David Heffernan Nov 02 '15 at 23:19
  • @DavidHeffernan In order to perform a shuffle you need to initialize a vector of numbers to be shuffled. Assuming that the matrix in question is a bit-vector, that data structure will be several times larger than the matrix. If m is much smaller than n, creating it takes several times longer than this algorithm's average time to finish. That said, shuffling just enough to produce the first m values always requires exactly m operations. And making random-seeming choices is expensive. Which one is actually best will depend on circumstances. – btilly Nov 04 '15 at 00:32