Python: Random list of numbers in a range keeping with a minimum distance

Question

Let's say this code random.seed(42) random.sample(range(0,40), 4) Output:[7, 1, 17, 15] What should I change in this code to generate random numbers where the minimum distance between any two numbers in the list will be be at least 10 or more. Something like [0, 10, 25, 39] or [0, 12, 23, 38 ]. Possible duplicate would be this. Thanks.

Do the numbers have to be below 40? – Mad Physicist Aug 19 '18 at 14:25 — Mad Physicist, Aug 19 '18 at 14:25
yeah. the range is 40. You can see range(0, 40) – Humaun Rashid Nayan Aug 19 '18 at 14:26 — Humaun Rashid Nayan, Aug 19 '18 at 14:26

Mark Dickinson · Accepted Answer · 2018-08-20T20:33:56.000

One-line solution for the sorted case

Here's a simple one-liner, that generates all possibilities with equal likelihood:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

A few sample outputs:

[2, 16, 26, 38]
[0, 10, 25, 35]
[2, 12, 25, 36]
[0, 13, 26, 39]
[1, 14, 24, 34]
[1, 11, 29, 39]
[0, 13, 26, 39]
[1, 12, 27, 38]

Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).

Explanation: if [a, b, c, d] is an ordered list satisfying your requirements, then [a, b-9, c-18, d-27] is an ordered sample of length 4 from range(13), and vice versa. So all you need to do is generate samples from range(13), sort them, and then re-add the necessary multiples of 9 to get values that are at least 10 apart.

General unsorted solution

Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.

import random

def ranks(sample):
    """
    Return the ranks of each element in an integer sample.
    """
    indices = sorted(range(len(sample)), key=lambda i: sample[i])
    return sorted(indices, key=lambda i: indices[i])

def sample_with_minimum_distance(n=40, k=4, d=10):
    """
    Sample of k elements from range(n), with a minimum distance d.
    """
    sample = random.sample(range(n-(k-1)*(d-1)), k)
    return [s + (d-1)*r for s, r in zip(sample, ranks(sample))]

And some sample outputs:

>>> sample_with_minimum_distance()
[17, 27, 3, 38]
>>> sample_with_minimum_distance()
[27, 38, 10, 0]
>>> sample_with_minimum_distance()
[36, 13, 1, 24]
>>> sample_with_minimum_distance()
[1, 25, 15, 39]
>>> sample_with_minimum_distance()
[26, 12, 1, 38]

The "cheap trick" solution

If the various constants here in the original problem are fixed (population range(40), samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only 715 possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list using random.choice.

For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:

>>> import itertools
>>> all_samples = [  # inefficient brute-force solution
...     sample for sample in itertools.product(range(40), repeat=4)
...     if all(x - y >= 10 for x, y in zip(sample[1:], sample))
... ]
>>> len(all_samples)
715

This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.

>>> all_samples = [
...     [9*i + s for i, s in enumerate(sample)]
...     for sample in itertools.combinations(range(13), 4)
... ]
>>> len(all_samples)
715

Either way, we generate the list of samples just once, and then use random.choice to pick one every time we need it:

>>> random.choice(all_samples)
(1, 11, 21, 38)
>>> random.choice(all_samples)
(0, 10, 23, 33)

Of course, this solution doesn't scale well: for 7 samples out of range(100) with a minimum distance of 5, there are over 2 billion possible different sorted samples.

Demonstration of uniformity

I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.

First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a tuple instead of a list, because for the next step we want something hashable.

>>> def sorted_sample():
...     return tuple(9*i + x for i, x in
...                  enumerate(sorted(random.sample(range(13), 4))))

Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:

>>> from collections import Counter
>>> samples = Counter(sorted_sample() for _ in range(10**7))

A couple of quick checks:

>>> len(samples)
715
>>> 10**7 / 715
13986.013986013986
>>> samples[0, 10, 20, 30]
14329
>>> samples[0, 11, 22, 33]
13995
>>> min(samples.values())
13624
>>> max(samples.values())
14329

We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately 10**7 / 715 times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.

Is that random variation within acceptable limits? To find out, we can do a chi-squared test with p = 0.01. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.

SciPy makes a chi-squared test for uniformity easy:

>>> from scipy.stats import chisquare
>>> chisquare(list(samples.values()))
Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.

Thanks for the great and thorough answer. Could you please explain with a little more detail how does the `n-(k-1)*(d-1)` value come from? Thanks — Matteo, Sep 23 '20 at 16:40
There are `k` numbers, so `k-1` "gaps" between those numbers. Each of the gaps has a minimum size `d`. So by reducing the size of each gap by `d-1`, we've reduced to the case where the gaps must have minimum size `1`, which is an easier problem (since it just implies that the samples need to be distinct). The `(k-1)*(d-1)` comes from that reduction. — Mark Dickinson, Sep 23 '20 at 18:37
Great in-depth explanation. However, one issue with this implementation is that the 'random' values are only random within set boundaries - In this case, the value at each index will always be in the corresponding decade. This solution allows for true randomness of the gaps: https://stackoverflow.com/questions/49948065/create-random-sequence-of-integers-within-a-range-and-with-a-minimum-distance-be — maccaroo, Nov 24 '21 at 07:37
@maccaroo: I'm not sure what you mean by "only random within set boundaries": in the example used, the minimum distance is 9, so *most* generated samples will indeed have one value per decade (simply because most of the combinations satisfying the requirements have that property). But it's certainly possible to get (for example) `(1, 10, 19, 31)` as an output. (You can easily check this with the `Counter` code that I show.) — Mark Dickinson, Nov 24 '21 at 08:31
@maccaroo: Whoops, sorry; minimum distance in the example is 10, not 9, so there _are_ no possible outputs that don't have one value per decade. Can you give an example of the sort of possible output you think is missing here? — Mark Dickinson, Nov 24 '21 at 08:37
@MarkDickinson My mistake. It seemed clear in my head yesterday, but I can't see any issues now. — maccaroo, Nov 25 '21 at 06:01

Mad Physicist · Answer 2 · 2018-08-19T19:01:33.860

2

Once you've generated a number, it removes a swath out of your range, since your know that no number can be within +/- 10 of the original.

A naive way to implement this would be to make a list of remaining numbers, and cut chunks out of it every time you pick a number:

domain = list(range(40))
result = []
while domain:
    n = random.choice(domain)
    result.append(n)
    domain = [x for x in domain if x <= n - 10 or x >= x + 10]

Keep in mind that every sample removes up to 19 elements from your domain. That means that you are by no means guaranteed to get 4 elements in the result, but at least 3 are guaranteed.

edited Aug 19 '18 at 19:01

answered Aug 19 '18 at 14:34

Mad Physicist

107,652
25
181
264

I think you want `random.choice` rather than `random.sample` (which requires two arguments). – Mark Dickinson Aug 19 '18 at 16:57
@Mark. Good call. Fixed. Any of `random.choice(domain)`, `random.sample(domain, 1)`, `domain[random.randrange(len(domain))]` would do. `choice` is definitely the best option. – Mad Physicist Aug 19 '18 at 19:07
@MadPhysicist I think you could use a set in an efficient way for the domain correction to be done efficiently. This algorithm is currently quadratic and could be linear using a set – Olivier Melançon Aug 19 '18 at 20:51
1

@OlivierMelançon. I was thinking of doing a binary search on the start and end indices, but for a large enough input sets would be faster. – Mad Physicist Aug 19 '18 at 22:20

score 1 · Answer 3 · answered Aug 19 '18 at 20:42

If the sample size remains proportional to the length of your domain, then one option is to shuffle the domain and pick the first four elements which satisfy the requirement.

Using a set to keep track of which numbers are excluded allows the process to be efficient.

Code

import random


def choose_with_step(domain, step, k):
    domain = list(domain)
    random.shuffle(domain)
    exclusions = set()
    choices = []

    while domain and k > 0:
        choice = domain.pop()

        if choice not in exclusions:
            choices.append(choice)
            for x in range(choice - step + 1, choice + step):
                exclusions.add(x)

            k -= 1

    return choices

Example of output

# choose_with_step(range(40), 10, 4)
[15, 5, 33]
[11, 25, 35, 0]
[27, 12, 37, 0]
[36, 9, 26]

Time-complexity

Since random.shuffle runs in O(n) and the algorithm traverses the shuffled list once, the algorithm is O(n * step).

The algorithm being linear with regard to the domain length is the reason for the requirement for the sample size to be proportional to the domain size, otherwise the list might be shuffled for picking only a few elements.

eamonious · Answer 4 · 2021-04-04T07:39:52.103

For anyone seeking clarification on the top answer's one-line solution, I thought this might be useful:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

The 9 represents: min_distance - 1

The 4 represents: sample_size

The 13 represents: range_size - ((min_distance - 1) * (sample_size - 1))

e.g.; 40 - 9*3 = 13 in the example case.

Also, if you find you run into an error where the sample size you want exceeds the calculated sample range (i.e.; 13 in the example), using random.choices() in place of random.sample() may help you, as it allows replacement when sampling, and achieves close to the same effect as the original solution. For example, to generate a list of 100 random integers with minimum distance 7 for a range of 765, the original solution will not work. However, the following will:

[7*i+x for i,x in enumerate(sorted(random.choices(list(range(72)),k=100)))])

Where the values mirror what I laid out above, except min_distance - 1 is replaced by min_distance. So, 7 equals min_distance, 100 equals sample size, and 72 = range_size - (min_distance * (sample_size - 1)), or 765 - 7*99. This method extrapolates to any values of range, distance, sample for distance * sample < range, which the original solution does not.

The problem with using random.choices() here is that, while it does generate all possible outcomes, it does not guarantee equal likelihood of all possible outcomes, as in the original solution. Depending on the task, however, this may not be important to you.

blhsing · Answer 5 · 2018-08-19T20:02:08.543

0

Since the 4 numbers have to each keep a distance of 10, that leaves a "wiggle room" of just 10 out of 40 for the 4 numbers to be randomly distributed in (because 40 - 3 * 10 = 10). You can therefore simply randomize 4 numbers within the room of 10, calculate the deltas, and add the deltas and the corresponding 10s to get the full list.

import random
d = sorted(random.randint(0, 9) for _ in range(4))
o = [b - a for a, b in zip([0] + d[:-1], d)]
print([i * 10 + sum(o[:i + 1]) for i in range(4)])

A sample of 10 runs:

[1, 13, 24, 37]
[4, 17, 27, 39]
[0, 10, 23, 33]
[1, 12, 27, 37]
[0, 13, 24, 35]
[3, 14, 27, 39]
[0, 11, 21, 38]
[1, 14, 26, 37]
[0, 11, 23, 39]
[1, 15, 28, 38]

edited Aug 19 '18 at 20:02

answered Aug 19 '18 at 15:03

blhsing

91,368
6
71
106

Note that this produces a list that never contains 0 (it always starts at 1 or higher), and with all spacings at least 11. – Mark Dickinson Aug 19 '18 at 16:53
Thanks. I've updated my answer to use `random.randint` instead of `random.sample` to create spacings, since for the purpose of creating spacings `random.sample` is the wrong method to use as it unnecessarily creates unique spacings. – blhsing Aug 19 '18 at 20:08
Yeah, I prefer the `random.sample`-based approach because it gives a uniform distribution (each possible output is equally likely to occur). With the repeated `randint`, an output like `[0, 11, 22, 33]` occurs approximately 24 times as often as `[0, 10, 20, 30]`. On the other hand, the OP didn't specify a desired distribution. – Mark Dickinson Aug 20 '18 at 18:19

score -1 · Answer 6 · answered Aug 19 '18 at 14:45

-1

Depending on the distribution you want, you could do this:

import random
def random_separated(n, start, stop, gap):
    numbers = []
    for i in range(n):
        while True:
            num = random.randint(start, stop)
            if all(n - gap < num < n + gap
                   for n in numbers):
                break
        numbers.append(num)
     return numbers

answered Aug 19 '18 at 14:45

wizzwizz4

6,140
2
26
62

1

This seems even less efficient than what I'm suggesting. – Mad Physicist Aug 19 '18 at 16:40
@MadPhysicist Yeah. I was originally planning to implement yours but with a sparse array, but I found that too hard and so decided to go with one that would waste valuable entropy. This one works for sufficiently large numbers, though. (I started before you posted, but don't have a real keyboard at the moment so was slow.) – wizzwizz4 Aug 19 '18 at 16:43
Yeah, mobile is painful. I'm on a phone as well. – Mad Physicist Aug 19 '18 at 16:47