Subtract all pairs of values from two arrays

Question

I have two vectors, v1 and v2. I'd like to subtract each value of v2 from each value of v1 and store the results in another vector. I also would like to work with very large vectors (e.g. 1e6 size), so I think I should be using numpy for performance.

Up until now I have:

import numpy
v1 = numpy.array(numpy.random.uniform(-1, 1, size=1e2))
v2 = numpy.array(numpy.random.uniform(-1, 1, size=1e2))
vdiff = []
for value in v1:
    vdiff.extend([value - v2])

This creates a list with 100 entries, each entry being an array of size 100. I don't know if this is the most efficient way to do this though. I'd like to calculate the 1e4 desired values very fast with the smallest object size (memory wise) possible.

You want a result which has ~10^6 * 10^6 = 10^12 values? That's not going to work. Even if you used only one byte per value you'd need a terabyte of memory. You could break it up, but that's still a lot of computations. Why do you think you want this giant difference matrix? Perhaps there's a better path to your goal. — DSM, Sep 27 '14 at 16:18
I want to compute differences between two vectors whose values have a given probability density function. The problem is that one of the vectors has values lower than zero with a 4e-6 probability of occuring so I need to generate 1e6 samples to catch them. But lets say I "only" needed 1e4 values per vector, I could I do it? — jpcgandre, Sep 27 '14 at 16:22

score 7 · Accepted Answer · edited Dec 18 '20 at 22:26

7

You're not going to have very much fun with the giant arrays that you mentioned. But if you have more reasonably-sized matrices (small enough that the result can fit in memory), the best way to do this is with broadcasting.

import numpy as np

a = np.array(range(5, 10))
b = np.array(range(2, 6))

res = a[None, :] - b[:, None]
print(res)
# [[3 4 5 6 7]
#  [2 3 4 5 6]
#  [1 2 3 4 5]
#  [0 1 2 3 4]]

edited Dec 18 '20 at 22:26

cs95

379,657
97
704
746

answered Sep 27 '14 at 16:30

Roger Fan

4,945
31
38

Thanks. As a follow up question: How could I plot the empirical cdf of all values of `res`? – jpcgandre Sep 27 '14 at 16:39
1

@jpcgandre There seem to be multiple stackoverflow questions already asking about the same topic. [This answer](http://stackoverflow.com/a/3220681/2922139) suggests using the [`ECDF`](http://statsmodels.sourceforge.net/devel/generated/statsmodels.distributions.empirical_distribution.ECDF.html) function in `statsmodels`. Though you might want to first convert res to a 1-d array using [`np.ravel`](http://stackoverflow.com/a/3220681/2922139) or [`np.flatten`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html). – Roger Fan Sep 27 '14 at 16:43

score 3 · Answer 2 · edited Dec 18 '20 at 22:26

3

`np.subtract.outer`

You can use np.ufunc.outer with np.subtract and then transpose:

a = np.array(range(5, 10))
b = np.array(range(2, 6))

res1 = np.subtract.outer(a, b).T

res2 = a[None, :] - b[:, None]
assert np.array_equal(res1, res2)

Performance is comparable between the two methods.

edited Dec 18 '20 at 22:26

cs95

379,657
97
704
746

answered Nov 18 '18 at 14:56

jpp

159,742
34
281
339

Subtract all pairs of values from two arrays

2 Answers2

`np.subtract.outer`

Linked

Related