3

I need a simple way to randomly select a letter from the alphabet, weighted on the percentage I want it to come up. For example, I want the letter 'E' to come up in the random function 5.9% of the time, but I only want 'Z' to come up 0.3% of the time (and so on, based on the average occurrence of each letter in the alphabet). Any suggestions? The only way I see is to populate an array with, say, 10000 letters (590 'E's, 3 'Z's, and so on) and then randomly select an letter from that array, but it seems memory intensive and clumsy.

Kara
  • 6,115
  • 16
  • 50
  • 57
Chris
  • 1,037
  • 15
  • 26

4 Answers4

5

Not sure if this would work, but it seems like it might do the trick:

  1. Take your list of letters and frequencies and sort them from smallest frequency to largest.
  2. Create a 26 element array where each element n contains the sum of all previous weights and the element n from the list of frequencies. Make note of the sum in the last element of the array
  3. Generate a random number between 0 and the sum you made note of above
  4. Do a binary search of the array of sums until you reach the element where that number would fall

That's a little hard to follow, so it would be something like this:

  1. if you have a 5 letter alphabet with these frequencies, a = 5%, b = 20%, c = 10%, d = 40%, e = 25%, sort them by frequency: a,c,b,e,d
  2. Keep a running sum of the elements: 5, 15, 35, 60, 100
  3. Generate a random number between 0 and 100. Say it came out 22.
  4. Do a binary search for the element where 22 would fall. In this case it would be between element 2 and 3, which would be the letter "b" (rounding up is what you want here, I think)
user1118321
  • 25,567
  • 4
  • 55
  • 86
2

You've already acknowledged the tradeoff between space and speed, so I won't get into that.

If you can calculate the frequency of each letter a priori, then you can pre-generate an array (or dynamically create and fill an array once) to scale up with your desired level of precision.

Since you used percentages with a single digit of precision after the decimal point, then consider an array of 1000 entries. Each index represents one tenth of one percent of frequency. So you'd have letter[0] to letter[82] equal to 'a', letter[83] to letter[97] equal to 'b', and so on up until letter[999] equal to 'z'. (Values according to Relative frequencies of letters in the English language)

Now generate a random number between 0 and 1 (using whatever favourite PRNG you have, assuming uniform distribution) and multiply the result by 1000. That gives you the index into your array, and your weighted-random letter.

Shaggy Frog
  • 27,575
  • 16
  • 91
  • 128
  • I like this solution, it's quite elegant, but I think I am gonna have to go with 118321's answer, even though you both deserve an accept. None the less, many props! – Chris Jun 07 '12 at 05:36
0

Use the method explained here. Alas this is for Python but could be rewritten for C etc. https://stackoverflow.com/a/4113400/129202

Community
  • 1
  • 1
Jonny
  • 15,955
  • 18
  • 111
  • 232
0

First you need to make a NSDicationary of the letters and their frequencies;

I'll explain it with an example: let's say your dictionary is something like this:

{@"a": @0.2, @"b", @0.5, @"c": @0.3};

So the frequency of you letters covers the interval of [0, 1] this way:

a->[0, 0.2] + b->[0.2, 0.7] + c->[0.7, 1]

You generate a random number between 0 and 1. Then easily by checking that this random belongs to which interval and returning the corresponding letter you get what you want.

you seed the random function at the beginning of you program: srand48(time(0));

-(NSSting *)weightedRandomForDicLetters:(NSDictionary *)letterFreq {

double randomNumber = drand48();

double endOfInterval = 0;
for (NSString *letter in dic){
    endOfInterval += [[letterFreq objectForKey:letter] doubleValue];
    if (randomNumber < endOfInterval) {
        return letter;
    }
}

}

Erfan
  • 143
  • 10