If you need to guarantee uniqueness, it may be more efficient to
- Try and generate
n
random floats in [lo, hi]
at once.
- If the length of the unique floats is not
n
, try and generate however many floats are still needed
and continue accordingly until you have enough, as opposed to generating them 1-by-1 in a Python level loop checking against a set.
If you can afford NumPy doing so with np.random.uniform
can be a huge speed-up.
import numpy as np
def gen_uniq_floats(lo, hi, n):
out = np.empty(n)
needed = n
while needed != 0:
arr = np.random.uniform(lo, hi, needed)
uniqs = np.setdiff1d(np.unique(arr), out[:n-needed])
out[n-needed: n-needed+uniqs.size] = uniqs
needed -= uniqs.size
np.random.shuffle(out)
return out.tolist()
If you cannot use NumPy, it still may be more efficient depending on your data needs to apply the same concept of checking for dupes afterwards, maintaining a set.
def no_depend_gen_uniq_floats(lo, hi, n):
seen = set()
needed = n
while needed != 0:
uniqs = {random.uniform(lo, hi) for _ in range(needed)}
seen.update(uniqs)
needed -= len(uniqs)
return list(seen)
Rough benchmark
Extreme degenerate case
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 2**-50, 1000)
153 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 2**-50, 1000)
495 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 2**-50, 1000)
618 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
More "normal" case (with larger sample)
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 1, 10**5)
15.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 1, 10**5)
65.7 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 1, 10**5)
78.8 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)