19

I know that there are easy ways to generate lists of unique random integers (e.g. random.sample(range(1, 100), 10)).

I wonder whether there is some better way of generating a list of unique random floats, apart from writing a function that acts like a range, but accepts floats like this:

import random

def float_range(start, stop, step):
    vals = []
    i = 0
    current_val = start
    while current_val < stop:
        vals.append(current_val)
        i += 1
        current_val = start + i * step
    return vals

unique_floats = random.sample(float_range(0, 2, 0.2), 3)

Is there a better way to do this?

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
Simon
  • 342
  • 1
  • 2
  • 10

8 Answers8

20

Answer

One easy way is to keep a set of all random values seen so far and reselect if there is a repeat:

import random

def sample_floats(low, high, k=1):
    """ Return a k-length list of unique random floats
        in the range of low <= x <= high
    """
    result = []
    seen = set()
    for i in range(k):
        x = random.uniform(low, high)
        while x in seen:
            x = random.uniform(low, high)
        seen.add(x)
        result.append(x)
    return result

Notes

  • This technique is how Python's own random.sample() is implemented.

  • The function uses a set to track previous selections because searching a set is O(1) while searching a list is O(n).

  • Computing the probability of a duplicate selection is equivalent to the famous Birthday Problem.

  • Given 2**53 distinct possible values from random(), duplicates are infrequent. On average, you can expect a duplicate float at about 120,000,000 samples.

Variant: Limited float range

If the population is limited to just a range of evenly spaced floats, then it is possible to use random.sample() directly. The only requirement is that the population be a Sequence:

from __future__ import division
from collections import Sequence

class FRange(Sequence):
    """ Lazily evaluated floating point range of evenly spaced floats
        (inclusive at both ends)

        >>> list(FRange(low=10, high=20, num_points=5))
        [10.0, 12.5, 15.0, 17.5, 20.0]

    """
    def __init__(self, low, high, num_points):
        self.low = low
        self.high = high
        self.num_points = num_points

    def __len__(self):
        return self.num_points

    def __getitem__(self, index):
        if index < 0:
            index += len(self)
        if index < 0 or index >= len(self):
            raise IndexError('Out of range')
        p = index / (self.num_points - 1)
        return self.low * (1.0 - p) + self.high * p

Here is a example of choosing ten random samples without replacement from a range of 41 evenly spaced floats from 10.0 to 20.0.

>>> import random
>>> random.sample(FRange(low=10.0, high=20.0, num_points=41), k=10)
[13.25, 12.0, 15.25, 18.5, 19.75, 12.25, 15.75, 18.75, 13.0, 17.75]
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 3
    this is what i would recommend ... set lookup is O(1) ... and random float is easy – Joran Beasley Jul 29 '17 at 23:36
  • For the question in the title, yes. But for the actual question (which has a step size argument), the probability of getting a duplicate in a discrete set like that is not very small so I think @user2357112's suggestion is more efficient. – ayhan Jul 29 '17 at 23:39
  • How better is this than simply `while x in result:`? – Vinícius Figueiredo Jul 29 '17 at 23:44
  • 1
    @ViníciusAguiar: Looking up `x in result` takes time proportional to the length of `result`. If you made `result` a set, it wouldn’t be unsorted. – Ry- Jul 29 '17 at 23:48
  • 1
    @ViníciusAguiar ``while x in result`` does an O(n) linear search. The ``x in seen`` approach does an O(1) search. For a large value of *k* the performance difference is enormous. – Raymond Hettinger Jul 29 '17 at 23:48
  • 1
    Oh, I see, thank you both for the answer, learned something today! – Vinícius Figueiredo Jul 29 '17 at 23:49
  • It is equivalent to the birthday problem with `n` being something like `(2^63)*100`. Unless something egregious would happen in the extremely rare case that duplicates occurred (which I can't imagine), I don't see why direct sampling from a uniform distribution wouldn't be fine. – miradulo Jul 29 '17 at 23:58
  • 1
    @ayhan Nah, the step thing in the question looks like only his attempt to solve his problem, not like what he actually wants. Read his text about it. – Stefan Pochmann Jul 30 '17 at 00:04
  • @Mitch Where does the `*100` come from? You can't have a population larger than 2^64 when you represent the values with 64 bits (as Python's `float` usually are). – Stefan Pochmann Jul 30 '17 at 00:16
  • @Mitch The population is closer to ``2**53`` since double precision floats have a 53-bit mantissa. But yes, the odds of reselection are very low. That said the OP did specifically ask for login like *random.sample()* and for the values to be unique. So this is the answer to the question as asked. What should be done in a real application really depends on why you want unique floats, how you're going to use them, and the cost of a rare duplicate. – Raymond Hettinger Jul 30 '17 at 00:17
  • @StefanPochmann I took into account his example range, sorry a bit confusing - I was just guess-timating. – miradulo Jul 30 '17 at 00:20
  • 2
    @RaymondHettinger And agreed! I don't know, I just get the impression 99% of people who believe they need "unique random floats" don't really need that "uniqueness", so I hope they don't get the wrong idea from this question/answer. Sampling without replacement from a continuous distribution isn't something I've ever come across - granted if you had a pathologically small interval it might be necessary for some reason. – miradulo Jul 30 '17 at 00:22
  • @RaymondHettinger Is it really 53? Not just 52? Because only 52 bits are explicitly stored. Actually I recently wondered how random floats are generated and failed to find out. Do you know how that's done? – Stefan Pochmann Jul 30 '17 at 00:25
  • @RaymondHettinger Hmm, thought more about it. Aren't there 2^52 possible values in [0.5, 1)? And 2^52 more in [0.25, 0.5)? And 2^52 more in [0.125, 0.25)? And so on? The exponent goes from -1 for the interval [0.5, 1) down to -1023. So that's 1023*2^52 possible values even just in [0, 1), no? Meaning about 2^62. – Stefan Pochmann Jul 30 '17 at 00:42
  • The code is in Module/_randommodule.c in the *random_random()* function. Essentially, it generates a 53-bit random integer and then multiplies by ``(1.0/9007199254740992.0)`` where the latter value is ``2.0 ** 53``. The technique assures the random floats are equidistributed. – Raymond Hettinger Jul 30 '17 at 00:45
  • @RaymondHettinger Awesome, thanks. Then you're obviously right, 2^53 possibilities. I still think it *could* use the above 2^62 but I guess people decided it's not worth it... – Stefan Pochmann Jul 30 '17 at 00:55
  • 2
    @StefanPochmann To go beyond 2^53, you need to use bits in the exponent. That would cause the random floats to no longer be equidistributed. – Raymond Hettinger Jul 30 '17 at 02:02
  • 2
    I second what Mitch said: I don't think the code to sample unique floating point numbers from a continuous distribution is ever useful. If you are OK with floats being arbitrarily close together, but they can't be the same, you are likely making some wrong assumptions. – Sven Marnach Aug 03 '17 at 09:31
  • can I get a list of random floats in increasing or decreasing sequence ? – Abdul Haseeb Nov 01 '21 at 19:46
  • @AbdulHaseeb The usual way is to generate the floats and then sort them. This is used for example to model arrival times for customers. – Raymond Hettinger Nov 02 '21 at 01:47
  • Thing is, I'm trying to generate IoT sensors data so they can increase gradually and decrease as well so sort thing will not work in this case. – Abdul Haseeb Nov 02 '21 at 10:56
  • @AbdulHaseeb I recommend that you open a new question. Your question is almost completely unrelated to this thread. Also, comments aren't the right place to ask questions. – Raymond Hettinger Nov 04 '21 at 00:24
9

You can easily use your list of integers to generate floats:

int_list = random.sample(range(1, 100), 10)
float_list = [x/10 for x in int_list]

Check out this Stack Overflow question about generating random floats.

If you want it to work with python2, add this import:

from __future__ import division
Or Duan
  • 13,142
  • 6
  • 60
  • 65
  • This will only work as intended in Python 3.x, because in Python 2.x it will do integer division. You could explicitly write `x / 10.` to make it work on both – Graipher Jul 30 '17 at 12:12
  • 1
    @Graipher Thanks, added python2 compatibility – Or Duan Jul 30 '17 at 13:26
5

If you need to guarantee uniqueness, it may be more efficient to

  1. Try and generate n random floats in [lo, hi] at once.
  2. If the length of the unique floats is not n, try and generate however many floats are still needed

and continue accordingly until you have enough, as opposed to generating them 1-by-1 in a Python level loop checking against a set.

If you can afford NumPy doing so with np.random.uniform can be a huge speed-up.

import numpy as np

def gen_uniq_floats(lo, hi, n):
    out = np.empty(n)
    needed = n
    while needed != 0:
        arr = np.random.uniform(lo, hi, needed)
        uniqs = np.setdiff1d(np.unique(arr), out[:n-needed])
        out[n-needed: n-needed+uniqs.size] = uniqs
        needed -= uniqs.size
    np.random.shuffle(out)
    return out.tolist()

If you cannot use NumPy, it still may be more efficient depending on your data needs to apply the same concept of checking for dupes afterwards, maintaining a set.

def no_depend_gen_uniq_floats(lo, hi, n):
    seen = set()
    needed = n
    while needed != 0:
        uniqs = {random.uniform(lo, hi) for _ in range(needed)}
        seen.update(uniqs)
        needed -= len(uniqs)
    return list(seen)

Rough benchmark

Extreme degenerate case

# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 2**-50, 1000)
153 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 2**-50, 1000)
495 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 2**-50, 1000)
618 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

More "normal" case (with larger sample)

# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 1, 10**5)
15.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 1, 10**5)
65.7 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 1, 10**5)
78.8 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
miradulo
  • 28,857
  • 6
  • 80
  • 93
  • 1
    So the point is that *numpy* is faster than regular Python for numeric work? And that you think there should be a *numpy* answer to a non-numpy question? – Raymond Hettinger Jul 30 '17 at 02:46
  • @RaymondHettinger My point was moreso that if we have a fast way (i.e. `np.random.uniform`) to generate a whole bunch of random floats at once, it is faster to try to do it all at once and _then_ check for dupes than proceed one at a time with `random.uniform`. So yes, I'm not sure how much of a performance gain you would see using `random.uniform` in a list comp. and updating a set (maybe just less set checks). but I think this could be useful for someone? – miradulo Jul 30 '17 at 02:51
  • Yes. Cleaning out dups at the end works just as well. – Raymond Hettinger Jul 30 '17 at 03:20
  • 1
    @RaymondHettinger I tried to flesh out my answer a bit to make the NumPy dependency more clear and added a Python example of what I was trying to say if I was talking in circles before :) thanks! – miradulo Jul 30 '17 at 03:28
4

You could just use random.uniform(start, stop). With double precision floats, you can be relatively sure that they are unique if your set is small. If you want to generate a big number of random floats and need to avoid that you have a number twice, check before adding them to the list.

However, if you are looking for a selection of specific numbers, this is not the solution.

C. Nitschke
  • 199
  • 6
  • 1
    If you're planning on checking for uniqueness before keeping a generated float, I would suggest using a set instead of a list to keep it O(n), as checking a list each time would make it O(n^2) – Niema Moshiri Jul 29 '17 at 23:43
  • 1
    Assurance of uniqueness is a https://en.wikipedia.org/wiki/Birthday_problem , so being "relatively sure that they are unique" isn't sound once the number of selections grows sufficiently large. – Raymond Hettinger Jul 29 '17 at 23:45
  • with the number of floats searched given in the example, the risk of having double values is small. Nevertheless, you are right. Answer updated. – C. Nitschke Jul 29 '17 at 23:53
1
min_val=-5
max_val=15

numpy.random.random_sample(15)*(max_val-min_val) + min_val

or use uniform

numpy.random.uniform(min_val,max_val,size=15)
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 3
    This doesn't guarantee uniqueness. Try for example `len(set(numpy.random.uniform(1, 1 + 2**-40, size=1000)))` and you'll get only about 887 instead of 1000. – Stefan Pochmann Jul 30 '17 at 00:11
1

As stated in the documentation Python has the random.random() function:

import random
random.random()

Then you will get a float val as: 0.672807098390448

So all you need to do is make a for loop and print out random.random():

>>> for i in range(10):
print(random.random())
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
aviad
  • 52
  • 4
0

more_itertools has a generic numeric_range that handles both integers and floats.

import random

import more_itertools as mit

random.sample(list(mit.numeric_range(0, 2, 0.2)), 3)
# [0.8, 1.0, 0.4]

random.sample(list(mit.numeric_range(10.0, 20.0, 0.25)), 10)
# [17.25, 12.0, 19.75, 14.25, 15.25, 12.75, 14.5, 15.75, 13.5, 18.25]
pylang
  • 40,867
  • 14
  • 129
  • 121
0

random.uniform generate float values

import random

def get_random(low,high,length):
  lst = []
  while len(lst) < length:
    lst.append(random.uniform(low,high))
    lst = list(set(lst))
  return lst
vishal vanpariya
  • 101
  • 2
  • 13