How to select value from different ranges with equal probability

Question

Provided different ranges, select each value with equal probability. Like say var 'a' can have values among { [10,20],40,[70,100]...} (given) . Each selected value by provided constraints should have same probability. How to get a random value in C?

जलजनक · Accepted Answer · 2022-04-04T12:30:10.227

Giving each Range equal probabilistic chance:

Let N be the number of ranges you've defined in your problem-set. Ranges { R₀, R₁, R₂ ... R_N-1 }, Indexes start at 0.
Generate a random number, RandValue mod N to pick a range. In C, modulo operator is %, gives you integral remainder.
Is picked range just a number? (like 40 in your example)
- 3.1 Yes, then your random value is that number
- 3.2 No, it's a range. Find a random value within selected range.

Giving each value in all ranges equal probabilistic chance:

Let N be the number of values across all ranges.
Map each value to an index, Values { V₀, V₁, V₂ ... V_N-1 }, Indexes start at 0.
Use hash-tables for quick lookups. Also, you can handle overlapping ranges.
Generate a random number, RandValue mod N to pick a value-index.
Look up in hash-table for mapped value against the index.

Also, note that hash-table could become huge if the ranges are too large. In that case you may have to merge overlapping/consecutive (if any) ranges and maintain sorted(by value-index) list(array of structs) of ranges and assign index-ranges. Use binary search to find the range against random-index. Range offsets (start/end values & indexes) should give the final value for a given random-index.

PS: This is for trivial implementations of randomness in C projects. I believe all randomness is deterministic.

Edit: I agree, there is modulo-bias & to reject values beyond (RAND_MAX - RAND_MAX % N).

It all depends on how strict you want to be. If you're doing cryptographic stuff, and want to be really sure that you're using good quality randomness and even distributions, then you should be wary of doing anything like modulo on random numbers. See for example this question: https://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator — Popup, Apr 04 '22 at 09:57
Using a hash-table for such a trivial mapping is completely inappropriate. You could just use an continuous array to index into the final values. That is very inefficient in memory use but still better than a hash table. — Jakob Stark, Apr 04 '22 at 11:24
@JakobStark `C` Arrays are the simplest form of hash-tables. — जलजनक, Apr 04 '22 at 11:26
A hash table uses a hash function at least that is the common definition of it. If you use the identity function as hash, you technically can call a indexed array a hash table. But why name the unneeded here? — Jakob Stark, Apr 04 '22 at 11:31

Popup · Answer 2 · 2022-04-04T12:23:38.083

Simple solution:

do
   r=rand();
until (is_in_range(r));

It's not at all efficient, and especially it's not bounded in running time. But it should work.

And sometimes simple and stupid solutions are good enough.

(Once you start doing things like r=rand()%limit;, then you start introducing skewed probabilities. Imagine doing r=rand()%((RAND_MAX/2)+1);. It's twice as likely to return anything below RAND_MAX/2 as RAND_MAX/2. See this answer for more detail. )

To improve performance, one could do something like what @Jakob Stark hinted at:

for(limit=1;limit<top_of_range;limit<<=1)
       ;  // Find the smallest power-of-two larger than the top_of_range
do
      r=rand()%limit;
while(!(is_in_range(r));

It's still not guaranteed to run in finite time, though...

If you already know that all values in all of the ranges are below the nth power of zero, you could throw away all but the n least significant bits from `rand()`. E.g if all values are below 256 you can use `r = rand() & 0xff`. This could increase the performance dramatically. — Jakob Stark, Apr 04 '22 at 11:15

How to select value from different ranges with equal probability

2 Answers2