3

Given an integer range R = [a, b] (where a >=0 and b <= 100), a bias integer n in R, and some deviation b, what formula can I use to skew a random number generator towards n?

So for example if I had the numbers 1 through 10 inclusively and I don't specify a bias number, then I should in theory have equal chances of randomly drawing one of them.

But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.

And if I specify a deviation of say 2 in addition to the bias number, then the number generator should be drawing from 1 through 5 a more frequently than 6 through 10.

What algorithm can I use to achieve this?

I'm using Ruby if it makes it any easier/harder.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
MxLDevs
  • 19,048
  • 36
  • 123
  • 194
  • Pretty small. No bigger than 100. – MxLDevs May 07 '12 at 16:41
  • sorry for wrong answer; missed that it's an integer. do you just want some "arbitrary" transform or does it have to have well-defined properties (like mean and sd equal to those given for a large enough sample)? – andrew cooke May 07 '12 at 16:42
  • Can you use a uniform distribution if there is no bias and a normal distribution if there is? – zimdanen May 07 '12 at 16:42
  • @andrewcooke arbitrary transform is fine. I just want the number generator to pick certain values more often. As long as I'm getting more 3's in my example it's cool. Or, with the deviance, more 1's through 5's. Would be nice if 3 was still the higher one in that case though. – MxLDevs May 07 '12 at 16:44
  • @zimdanen Normal would be too skewed towards the "bias number". One option is a mixture of a uniform over R and a uniform(?) over the "bias range", depending on what properties the OP wants. – Danica May 07 '12 at 16:44
  • why is normal too skewed? that depends entirely on the sd you give - you can choose it however you want. – andrew cooke May 07 '12 at 16:57
  • The problem with the solution you provided, andrew cooke, is that it's unimodal, where Keikoku suggested that he/she was looking for possibly multimodal bias. – Jérémie May 08 '12 at 21:02

3 Answers3

3

i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:

  • generate a normal value with given mean and sd
  • round to nearest integer
  • if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat

if you need to generate a normal from a uniform the simplest transform is "box-muller".

there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").

also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.

if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.

andrew cooke
  • 45,717
  • 10
  • 93
  • 143
  • For the first approach, that means if I give mean=2 and sd = 3, that effectively eliminates numbers like 8 from being chosen right? Maybe I could generate another random number to choose whether to discard? – MxLDevs May 07 '12 at 16:49
  • no, you'll get 8. that's (8-2)/3 = 2 sd away from the mean, so there's no problem; it just won't be very common. in theory the normal distro goes to infinity (in practice box muller doesn't match that, but ruby probably has a normal/gaussian random generator that works better). see edit. – andrew cooke May 07 '12 at 16:55
  • I use `rnd = [1,1,1,2].sample` for a 1/4 chance of getting a 2. – Kris Oct 05 '12 at 11:23
0

For the first part "But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.", a very easy solution:

def randBias(a,b,biasedNum=None, bias=0):
   x = random.randint(a, b+bias)
   if x<= b:
       return x
   else:
       return biasedNum

For the second part, I would say it depends on the task. In a case where you need to generate a billion random numbers from the same distribution, I would calculate the probability of the numbers explicitly and use weighted random number generator (see Random weighted choice )

Community
  • 1
  • 1
ElKamina
  • 7,747
  • 28
  • 43
0

If you want an unimodal distribution (where the bias is just concentrated in one particular value of your range of number, for example, as you state 3), then the answer provided by andrew cooke is good---mostly because it allows you to fine tune the deviation very accurately.

If however you wish to make several biases---for instance you want a trimodal distribution, with the numbers a, (a+b)/2 and b more frequently than others, than you would do well to implement weighted random selection.

A simple algorithm for this was given in a recent question on StackOverflow; it's complexity is linear. Using such an algorithm, you would simply maintain a list, initial containing {a, a+1, a+2,..., b-1, b} (so of size b-a+1), and when you want to add a bias towards X, you would several copies of X to the list---depending on how much you want to bias. Then you pick a random item from the list.

If you want something more efficient, the most efficient method is called the "Alias method" which was implemented very clearly in Python by Denis Bzowy; once your array has been preprocessed, it runs in constant time (but that means that you can't update the biases anymore once you've done the preprocessing---or you would to reprocess the table).

The downside with both techniques is that unlike with the Gaussian distribution, biasing towards X, will not bias also somewhat towards X-1 and X+1. To simulate this effect you would have to do something such as

def addBias(x, L):
   L = concatList(L, [x, x, x, x, x])
   L = concatList(L, [x+2])
   L = concatList(L, [x+1, x+1]) 
   L = concatList(L, [x-1,x-1,x-1])
   L = concatList(L, [x-2])
Community
  • 1
  • 1
Jérémie
  • 1,353
  • 9
  • 19