3

I have two vectors, v1 and v2. I'd like to subtract each value of v2 from each value of v1 and store the results in another vector. I also would like to work with very large vectors (e.g. 1e6 size), so I think I should be using numpy for performance.

Up until now I have:

import numpy
v1 = numpy.array(numpy.random.uniform(-1, 1, size=1e2))
v2 = numpy.array(numpy.random.uniform(-1, 1, size=1e2))
vdiff = []
for value in v1:
    vdiff.extend([value - v2])

This creates a list with 100 entries, each entry being an array of size 100. I don't know if this is the most efficient way to do this though. I'd like to calculate the 1e4 desired values very fast with the smallest object size (memory wise) possible.

jpp
  • 159,742
  • 34
  • 281
  • 339
jpcgandre
  • 1,487
  • 5
  • 31
  • 55
  • 2
    You want a result which has ~10^6 * 10^6 = 10^12 values? That's not going to work. Even if you used only one byte per value you'd need a terabyte of memory. You could break it up, but that's still a lot of computations. Why do you think you want this giant difference matrix? Perhaps there's a better path to your goal. – DSM Sep 27 '14 at 16:18
  • I want to compute differences between two vectors whose values have a given probability density function. The problem is that one of the vectors has values lower than zero with a 4e-6 probability of occuring so I need to generate 1e6 samples to catch them. But lets say I "only" needed 1e4 values per vector, I could I do it? – jpcgandre Sep 27 '14 at 16:22

2 Answers2

7

You're not going to have very much fun with the giant arrays that you mentioned. But if you have more reasonably-sized matrices (small enough that the result can fit in memory), the best way to do this is with broadcasting.

import numpy as np

a = np.array(range(5, 10))
b = np.array(range(2, 6))

res = a[None, :] - b[:, None]
print(res)
# [[3 4 5 6 7]
#  [2 3 4 5 6]
#  [1 2 3 4 5]
#  [0 1 2 3 4]]
cs95
  • 379,657
  • 97
  • 704
  • 746
Roger Fan
  • 4,945
  • 31
  • 38
  • Thanks. As a follow up question: How could I plot the empirical cdf of all values of `res`? – jpcgandre Sep 27 '14 at 16:39
  • 1
    @jpcgandre There seem to be multiple stackoverflow questions already asking about the same topic. [This answer](http://stackoverflow.com/a/3220681/2922139) suggests using the [`ECDF`](http://statsmodels.sourceforge.net/devel/generated/statsmodels.distributions.empirical_distribution.ECDF.html) function in `statsmodels`. Though you might want to first convert res to a 1-d array using [`np.ravel`](http://stackoverflow.com/a/3220681/2922139) or [`np.flatten`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html). – Roger Fan Sep 27 '14 at 16:43
3

np.subtract.outer

You can use np.ufunc.outer with np.subtract and then transpose:

a = np.array(range(5, 10))
b = np.array(range(2, 6))

res1 = np.subtract.outer(a, b).T
res2 = a[None, :] - b[:, None]
assert np.array_equal(res1, res2)

Performance is comparable between the two methods.

cs95
  • 379,657
  • 97
  • 704
  • 746
jpp
  • 159,742
  • 34
  • 281
  • 339