2

I have an array a of numbers.

rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# array([-11.36680112, -15.83791699, -39.86827287, -33.81273354,
#       -19.55547753])

I want to find difference of the array with each element in the same array. The example output would be:

[array([ 0.        ,  4.47111586, 28.50147174, 22.44593241,  8.18867641]),
 array([-4.47111586,  0.        , 24.03035588, 17.97481655,  3.71756054]),
 array([-28.50147174, -24.03035588,   0.        ,  -6.05553933,
        -20.31279534]),
 array([-22.44593241, -17.97481655,   6.05553933,   0.        ,
        -14.25725601]),
 array([-8.18867641, -3.71756054, 20.31279534, 14.25725601,  0.        ])]

My first approach was to use a list comprehension [i - a for i in a]. However, since my original array a is quite huge, and I have thousands of such as where I need to do the same operation, the overall process becomes quite slow and memory hungry to the point where jupyter kernel dies.

Is there any possible way where I could speed up this?

eroot163pi
  • 1,791
  • 1
  • 11
  • 23
cmbfast
  • 489
  • 4
  • 9
  • look at my answer using numba, while numpy is great for smaller array but slows on larger arrays for vector operations – eroot163pi Aug 14 '21 at 10:07

2 Answers2

3

The easiest way is to use broadcasting:

import numpy as np
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
a[:, None] - a 

which outputs:

array([[  0.        ,   4.47111586,  28.50147174,  22.44593241,
          8.18867641],
       [ -4.47111586,   0.        ,  24.03035588,  17.97481655,
          3.71756054],
       [-28.50147174, -24.03035588,   0.        ,  -6.05553933,
        -20.31279534],
       [-22.44593241, -17.97481655,   6.05553933,   0.        ,
        -14.25725601],
       [ -8.18867641,  -3.71756054,  20.31279534,  14.25725601,
          0.        ]])
Péter Leéh
  • 2,069
  • 2
  • 10
  • 23
  • Indeed, this is so so much faster than my original list comprehension. For an array with 30000 elements, it look less than 1 secs, whereas my original method crashed jupyter. However, for an array with a bit more (40000) elements, it throws out of memory error. `MemoryError: Unable to allocate 11.9 GiB for an array with shape (40000, 40000) and data type float64` – cmbfast Aug 14 '21 at 09:28
  • @cmbfast If your result cannot fit into memory in one chunk, you simply cannot do that without writing it to disk.. Maybe h5py can help you with that. – Péter Leéh Aug 14 '21 at 09:43
2

There are two ways you can do, one is using only numpy vectos

  1. Memory inefficient, (in this case numpy is faster). But still should be first method you try if smaller array size
a[:, None] - a 
  1. Using numba + numpy, it has llvm optimization so it can do magic on speed and also you can square your speed with parallel = True option. For super huge arrays, this should be go to. Or c++

For 40000 size, this finish in 3s without parallelism and 0.6s on my 12 core machine with parallelism

import numpy as np
import numba as nb

rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)

# return type nb.float64[:, :]
# input argument type nb.float64[:, :]
# By specifying these you can do eager compilation instead of lazy
# also you can add parallel = True, cache=True
# if you are using python threading then nogil=True
# you can do lots of stuff
# numba has SIMD vectorization, which just means it shall not loose to numpy on performance grounds if coded properly
@nb.njit(nb.float64[:, :](nb.float64[:]))
def speed(a):
    # empty to prevent unnecessary initializations
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)

    # nb.prange needed to tell numba this for loop can be parallelized
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

speed(a)

Performance

import numpy as np
import numba as nb
import sys
import time


@nb.njit(nb.float64[:, :](nb.float64[:]))
def f1(a):
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

@nb.njit(nb.float64[:, :](nb.float64[:]), parallel=True, cache=True)
def f2(a):
    b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[0]):
            b[i][j] = a[i] - a[j]
    return b

def f3(a):
    return a[:, None] - a

if __name__ == '__main__':
    s0 = time.time()
    rnd = np.random.default_rng(12345)
    a = rnd.uniform(0, -50, int(sys.argv[2]))
    b = eval(sys.argv[1] + '(a)')
    print(time.time() - s0)
(base) xxx:~$ python test.py f1 40000
3.0324509143829346
(base) xxx:~$ python test.py f2 40000
0.6196465492248535
(base) xxx:~$ python test.py f3 40000
2.4126882553100586

I faced similar restrictions, I needed something fast. I get around 50x speed up without parallelism just by resolving memory usage and numba Why are np.hypot and np.subtract.outer very fast?

eroot163pi
  • 1,791
  • 1
  • 11
  • 23