Compute random values inside a euclidean distance efficiently (in terms of time) Python

Question

Could you help me compute N random pairs vectors x and y with size d and such that x_i and y_i belong to the real range U[-1,1] and such that the euclidean distance between x and y could be less or equal than a small T? I need to compute these N pairs efficiently (in terms of time) in Python.

I tried

 import numpy as np
 d = 4
 inv_d = 1.0 / d
 random_lst = []
 T = 0.05
 N = 10
 for i in range(N):
     while(1):
         x = np.random.uniform(low=-1, high=1, size=d)
         y = np.random.uniform(low=-1, high=1, size=d)
         length = np.linalg.norm(x-y)
         if length <= T:
             random_lst.append([x,y])
             print(length)
             break
 print(random_lst)

But it took a lot of time (40s), and I need N near from 10^6, that is maybe it could be take even more time

Martin Wettstein · Answer 1 · 2021-08-30T08:50:43.643

1

You could just make y dependent on x. If it has to lie less than T away from x, just create a random vector with the absolute length smaller than T and add it to x. So, you get a pair of vectors close to each other.

Since the distance in no dimension may be larger than T, the space in which this vector may lie is bounded by a d-dimensional cube with size T. That's a way smaller space than the original [-1,1] space.

x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-T, high=T, size=d) ## Maybe, you are lucky in the first go
while np.linalg.norm(dxy)>T:
    dxy = np.random.uniform(low=-T, high=T, size=d) ## Repeat until lucky.
y = np.add(x,dxy)

EDIT:

If you need it much faster, you should not randomly pick from a box to find a point within a given range. Just rescale the deviation to a random value smaller than T, regardless of its original length:

x = np.random.uniform(low=-1, high=1, size=d)
dxy = np.random.uniform(low=-1, high=1, size=d) ## will be too large in most cases
scale = T/np.linalg.norm(dxy) ## Factor by which it has to be multiplied to be exactly T
dxy = np.multiply(dxy,random.random()*scale) ## Random value between 0 and scale
y = np.add(x,dxy)

So, you only have to randomly pick one second vector and only have to compute one length per pair. That should speed things up, as these are the time limiting operations.

edited Aug 30 '21 at 08:50

answered Aug 30 '21 at 07:49

Martin Wettstein

2,771
2
9
15

thanks @Martin, this pretty nice but still slow, because dim ~ 512 and the number of pairs I need is near from 10^6. I measured with dim ~64 and number of pairs samples 10^6 these settings and this took more than 150s – Juan Aug 30 '21 at 08:09
OK, I did not know that the requirements are that high. You might slightly decrease the size of the small box to make it more efficient. As there is only a 50% chance that a point inside the `[-T;T]` space lies within a spere with radius `<=T`, you get lots of misses. You could reduce the size of the box to `[-.8*T;.8*T]` to make the algorithm almost twice as fast. Or you could just leave out the `while` and rescale `dxy` if it is larger than T. You can rescale it by any random scalar smaller than `|dxy|/T`. – Martin Wettstein Aug 30 '21 at 08:21
This was my first idea, too, but while the probability of a "hit" is pi/4 in 2 dimensions, I think it goes down quickly for more dimensions and will indeed be _very_ low for 512 dimensions (the max distance to one of the "corners" in the hypercube will be ~22) – tobias_k Aug 30 '21 at 08:38
1

In that case, rescaling the small vector would be the best choice. See my EDIT. – Martin Wettstein Aug 30 '21 at 08:51

constt · Answer 2 · 2021-08-30T08:43:27.213

Here is a quick and straightforward approximate solution to the problem. The resulting vectors will be always less and close enough but not equal to the given length. First, let's implement a helper function that is used to split any natural number into a set of positive random numbers which add up to the given input number.

def split(num, buckets=1, result=None):

    if result is None:
        result = []

    if buckets < 2:
        result.append(num)
        np.random.shuffle(result)
        return result

    bucket = np.random.uniform(low=0, high=num) if num > 0 else 0
    result.append(bucket)

    return split(num - bucket, buckets - 1, result)

Here is how it works

>>> xs = split(10, buckets=3)
>>> xs
[7.60495737395197, 0.6968567573189194, 1.698185868729111]
>>> sum(xs)
10.0

Now, let's make a function that returns a pair of points from the Euclidean space given the coordinate bounds, number of dimensions, and distance.

def point_pair(low=-1, high=1, dim=2, dist=0.05):
    x = np.random.uniform(low=-1, high=1, size=dim)
    diff = split(dist, dim)
    y = np.clip(x + diff, low, high)
    return x, y

This is how it works:

>>> point_pair(dim=4, dist=0.05)
(array([ 0.18856095, -0.02635086, -0.59616698, -0.51447733]),
 array([ 0.20485765, -0.01791704, -0.59026813, -0.49510669]))

Finally, let's test it out:

>>> pairs = []
>>> for _ in range(10):
        pairs.append(point_pair(dim=4, dist=0.05))
>>> all(np.linalg.norm(x - y) < 0.05 for x, y in pairs)
True

Here is a simple run test on a fairly slow machine Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz:

>>> %timeit point_pair(dim=4, dist=0.05)
38.6 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

thanks @constt, this pretty nice but still slow, because dim ~ 512 and the number of pairs I need is near from 10**6. I measured with dim ~64 and number of pairs samples 10^6 these settings and this took like 150s — Juan, Aug 30 '21 at 08:09
@Juan OK, I see. You can try to speed it up using PyPy though. — constt, Aug 30 '21 at 08:18
I like the idea, but shouldn't you do `diff = [k**0.5 for k in split(dist**2, dim)]`? The distance is `sq(sum(i) (xi-yi)**2)`, so you should split the square of the distance and then get the square root of the parts to get a point at exactly the allowed distance. (Then, get a random distance up to the allowed distance and give each part a random sign.) — tobias_k, Aug 30 '21 at 08:34
Also, note that the `split` function does not produce a uniform split. The estimated value for the first part with be 1/2, then 1/4 for the second, 1/8 for the third, and so on. You can see this by printing e.g. `split(100, 10)`. Still, the basic idea is sound. — tobias_k, Aug 30 '21 at 08:36

score 0 · Answer 3 · answered Aug 30 '21 at 06:10

0

One improvement I can think of is to pick y from the circle around x with radius T (instead of randomly picking it everywhere and checking if that's in the circle). This way you don't need the inner loop and it may reduce the run time significantly.

answered Aug 30 '21 at 06:10

eagr

58
7

thanks, do you know how I can pick randomly from that circle using python? – Juan Aug 30 '21 at 06:13
@Juan You can translate aioobe's answer: https://stackoverflow.com/a/5838055/15174775 into python codes. – HALF9000 Aug 30 '21 at 06:23
Please add further details to expand on your answer, such as working code or documentation citations. – Community Aug 30 '21 at 10:17

score 0 · Answer 4 · answered Aug 30 '21 at 09:05

(Building upon the idea by constt, but I think there are two problems with the approach: The split function is not really uniform, and the distance is not quite right.)

The idea is: The distance between two points is sqrt(sum(i) (xi-yi)**2), so to get a random point at exactly that distance we can split the square of the distance into random parts, then get the sqrt of those.

First, to get a uniform split into k parts, we can generate k-1 uniformly distributed numbers, and then get the pairwise difference of those (when sorted) and the two boundaries:

def split(val, k):
    nums = np.random.uniform(low=0, high=val, size=k-1)
    return np.diff([0] + sorted(nums) + [val])

for _ in range(10):
    sp = split(100, 10)
    print([round(x, 1) for x in sp], sum(sp), len(sp))

Then to get the pairs of points, we generate one random point, then apply the difference by splitting the square of the difference and getting the sqrt of the parts:

def points_pair(dim, max_dist):
    X = np.random.uniform(low=-1, high=1, size=dim)
    diff = [x**.5 for x in split(max_dist**2, dim)]
    return X, X + diff

for _ in range(10):
    x, y = points_pair(512, 0.1)
    print(np.linalg.norm(x - y))

This will only generate points at exactly the max_dist and only in the "upper-right" quadrant (or whatever is it's equivalent in 512 dimensions). To get a random second point in any quadrant and up to that distance, also randomize the distance and the signs of the components:

    dist = np.random.uniform(0, max_dist)
    diff = [x**.5 * random.choice([+1, -1]) for x in split(dist**2, dim)]

(You might also have to check and re-roll the diff if X + diff is outside the bounds of [-1, +1].)

Compute random values inside a euclidean distance efficiently (in terms of time) Python

4 Answers4