3

Suppose I have a big 2D array of values in the range [0,1] where 0 means "impossible" and 1 means "highly probable".

How can I select a random set of points in this array according to the probabilities described above ?

Nicolas Repiquet
  • 9,097
  • 2
  • 31
  • 53
  • Duplicate. Many variations of this question have been asked, here is one. http://stackoverflow.com/questions/2140787/select-random-k-elements-from-a-list-whose-elements-have-weights – Samsdram Feb 17 '11 at 14:51

2 Answers2

5

One way to look at the problem is to ignore (for the moment) the fact that you're dealing with a 2d grid. What you have are a set of weighted items. A standard way of randomly selecting from such a set is to:

  1. sum the weights, call the sum s
  2. select a uniform random value 0 <= u < s
  3. iterate through the items, keeping a running total t of the weights of the items you've examined
  4. as soon as t >= u, select the item you're currently looking at (the one whose weight you just added).

This can be modified to make multiple selections without replacement by adding the following steps:

  • After each selection, deduct the weight of the selected item from s (if your weights are floating point and stability is an issue, you might prefer to recalculate it from scratch, at least occasionally).

  • repeat from 2, but in step 3 skip items that have been previously selected.

If summing the weights is infeasible or undesirable (which it may be if your array is particularly large) there are other options. The first that comes to mind is rejection sampling, which is a fairly broad topic so I'll just refer you to google and wikipedia on that one, as their coverage is pretty good.

EDIT: Forgot to come back to the fact that you have a 2D array. You can speed things up significantly by pre-computing (MIPMAP-style) the sums of weights of a hierarchy of regions in the map, so you can skip quickly to the location of the actual selected weight.

mokus
  • 3,490
  • 16
  • 22
  • My array is indeed really huge, so the first solution you describe don't seems practicable. Rejection sampling look promising though. Thanks for your help. – Nicolas Repiquet Feb 18 '11 at 07:43
  • @Nicholas Repiquet: I wouldn't necessarily write off the first entirely either. The cost of summing the array is comparable to that of creating it in the first place, and with some work can be done at the same time (and kept up-to-date if the array changes frequently). But if you can come up with a relatively tight upper bound or if you're not drawing a large number of samples (and therefore many potential rejections are ok) then yes, rejection is a pretty good way to go. – mokus Feb 18 '11 at 14:20
  • Actually upon further reflection I recall another technique worth looking into - Reservoir sampling. It requires a single traversal of the array with one random variate per element. The wikipedia article is pretty concise but gets the point across, at least for the unweighted case. With a little math it's not too hard to generalize to weighted sampling. – mokus Feb 18 '11 at 14:37
  • The simple and efficient way to do a hierarchy of regions is to make each region a range of indices, which doesn't in itself depend on it being a 2D array. But when it is, at the higher levels of the hierarchy they will be ranges of rows, and at the lower levels ranges of elements of a single row. – Stewart Jul 05 '11 at 12:15
0

the code:

  count = 0
  maxPointsInSet = 100
  foreach(point in array){
      if(randomWithChacnce(point.value))) {
         addToSet(point)
         count++
      }
      if(count == maxPointsInSet)
         break;
  }

  function randomWithChacnce(int num){
    random = a randomized number between 0 to 1 // or random from 1 to 100 num*100
    if(num >= random)
     return true;
    return false
  }

if u need it in any specific language just ask me

The GiG
  • 2,571
  • 2
  • 28
  • 29