Difference of items from an array with the same array in numpy

Question

I have an array a of numbers.

rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# array([-11.36680112, -15.83791699, -39.86827287, -33.81273354,
#       -19.55547753])

I want to find difference of the array with each element in the same array. The example output would be:

[array([ 0.        ,  4.47111586, 28.50147174, 22.44593241,  8.18867641]),
 array([-4.47111586,  0.        , 24.03035588, 17.97481655,  3.71756054]),
 array([-28.50147174, -24.03035588,   0.        ,  -6.05553933,
        -20.31279534]),
 array([-22.44593241, -17.97481655,   6.05553933,   0.        ,
        -14.25725601]),
 array([-8.18867641, -3.71756054, 20.31279534, 14.25725601,  0.        ])]

My first approach was to use a list comprehension [i - a for i in a]. However, since my original array a is quite huge, and I have thousands of such as where I need to do the same operation, the overall process becomes quite slow and memory hungry to the point where jupyter kernel dies.

Is there any possible way where I could speed up this?

look at my answer using numba, while numpy is great for smaller array but slows on larger arrays for vector operations — eroot163pi, Aug 14 '21 at 10:07

score 3 · Answer 1 · answered Aug 14 '21 at 09:19

3

The easiest way is to use broadcasting:

import numpy as np
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
a[:, None] - a

which outputs:

array([[  0.        ,   4.47111586,  28.50147174,  22.44593241,
          8.18867641],
       [ -4.47111586,   0.        ,  24.03035588,  17.97481655,
          3.71756054],
       [-28.50147174, -24.03035588,   0.        ,  -6.05553933,
        -20.31279534],
       [-22.44593241, -17.97481655,   6.05553933,   0.        ,
        -14.25725601],
       [ -8.18867641,  -3.71756054,  20.31279534,  14.25725601,
          0.        ]])

answered Aug 14 '21 at 09:19

Péter Leéh

2,069
2
10
23

Indeed, this is so so much faster than my original list comprehension. For an array with 30000 elements, it look less than 1 secs, whereas my original method crashed jupyter. However, for an array with a bit more (40000) elements, it throws out of memory error. `MemoryError: Unable to allocate 11.9 GiB for an array with shape (40000, 40000) and data type float64` – cmbfast Aug 14 '21 at 09:28
@cmbfast If your result cannot fit into memory in one chunk, you simply cannot do that without writing it to disk.. Maybe h5py can help you with that. – Péter Leéh Aug 14 '21 at 09:43

eroot163pi · Accepted Answer · 2021-08-14T11:09:11.197

There are two ways you can do, one is using only numpy vectos

Memory inefficient, (in this case numpy is faster). But still should be first method you try if smaller array size

a[:, None] - a

Using numba + numpy, it has llvm optimization so it can do magic on speed and also you can square your speed with parallel = True option. For super huge arrays, this should be go to. Or c++

For 40000 size, this finish in 3s without parallelism and 0.6s on my 12 core machine with parallelism

import numpy as np
import numba as nb

rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)

# return type nb.float64[:, :]
# input argument type nb.float64[:, :]
# By specifying these you can do eager compilation instead of lazy
# also you can add parallel = True, cache=True
# if you are using python threading then nogil=True
# you can do lots of stuff
# numba has SIMD vectorization, which just means it shall not loose to numpy on performance grounds if coded properly
@nb.njit(nb.float64[:, :](nb.float64[:]))
def speed(a):
    # empty to prevent unnecessary initializations
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)

    # nb.prange needed to tell numba this for loop can be parallelized
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

speed(a)

Performance

import numpy as np
import numba as nb
import sys
import time


@nb.njit(nb.float64[:, :](nb.float64[:]))
def f1(a):
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

@nb.njit(nb.float64[:, :](nb.float64[:]), parallel=True, cache=True)
def f2(a):
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

def f3(a):
    return a[:, None] - a

if __name__ == '__main__':
    s0 = time.time()
    rnd = np.random.default_rng(12345)
    a = rnd.uniform(0, -50, int(sys.argv[2]))
    b = eval(sys.argv[1] + '(a)')
    print(time.time() - s0)

(base) xxx:~$ python test.py f1 40000
3.0324509143829346
(base) xxx:~$ python test.py f2 40000
0.6196465492248535
(base) xxx:~$ python test.py f3 40000
2.4126882553100586

I faced similar restrictions, I needed something fast. I get around 50x speed up without parallelism just by resolving memory usage and numba Why are np.hypot and np.subtract.outer very fast?

Were those 4s with unparallel numba, or with the numpy solution? — Kelly Bundy, Aug 14 '21 at 10:28
yes, on my machine it finish in 4-5 seconds with unparallel numba — eroot163pi, Aug 14 '21 at 10:29
@KellyBundy I should have tested first the performance, numpy is faster in this case — eroot163pi, Aug 14 '21 at 10:40
But I still think numba is worth trying when numpy is not satisfying — eroot163pi, Aug 14 '21 at 10:42
Great and very useful tricks! I tried this, but sadly this also killed jupyter kernel. How did you get around with that? For comparison, my machine is an intel i7, with 16GiB of RAM. — cmbfast, Aug 14 '21 at 23:02
Maybe you are already on memory limits. I just run the code only. If you make limits to 45000, kernel will be killed — eroot163pi, Aug 14 '21 at 23:24

Difference of items from an array with the same array in numpy

2 Answers2