48

Let's say I have 2 one-dimensional (1D) numpy arrays, a and b, with lengths n1 and n2 respectively. I also have a function, F(x,y), that takes two values. Now I want to apply that function to each pair of values from my two 1D arrays, so the result would be a 2D numpy array with shape n1, n2. The i, j element of the two-dimensional array would be F(a[i], b[j]).

I haven't been able to find a way of doing this without a horrible amount of for-loops, and I'm sure there's a much simpler (and faster!) way of doing this in numpy.

Thanks in advance!

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Misconstruction
  • 1,839
  • 4
  • 17
  • 23

6 Answers6

26

You can use numpy broadcasting to do calculation on the two arrays, turning a into a vertical 2D array using newaxis:

In [11]: a = np.array([1, 2, 3]) # n1 = 3
    ...: b = np.array([4, 5]) # n2 = 2
    ...: #if function is c(i, j) = a(i) + b(j)*2:
    ...: c = a[:, None] + b*2

In [12]: c
Out[12]: 
array([[ 9, 11],
       [10, 12],
       [11, 13]])

To benchmark:

In [28]: a = arange(100)

In [29]: b = arange(222)

In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
10 loops, best of 3: 29.9 ms per loop

In [31]: timeit c = a[:, None] + b*2
10000 loops, best of 3: 71.6 us per loop
endolith
  • 25,479
  • 34
  • 128
  • 192
zhangxaochen
  • 32,744
  • 15
  • 77
  • 108
  • 1
    `newaxis` may be a formal way to add a new axis to ndarray, even if they are the same. – Kattern Jan 20 '14 at 06:50
  • 1
    Yes, they are the same in the internal code. But `newaxis` is easier to understand, something more like a grammar sugar. – Kattern Jan 20 '14 at 06:58
  • 1
    This looks like what I'm looking for, I will try it out immediately! – Misconstruction Jan 22 '14 at 01:20
  • This works great! Could this be expanded to comparing two 2D arrays? So the result would be that function of each pair of columns between the two 2D arrays – Misconstruction Jan 22 '14 at 19:21
  • assuming that I have a numpy array and I want to apply a function in row pairs and accumulate the result e.g. row0+row1 , output of this + row3 etc, how can I do this in a smart way? – seralouk Apr 20 '19 at 16:28
15

If F is beyond your control, you can wrap it automatically to be "vector-aware" by using numpy.vectorize. I present a working example below where I define my own F just for completeness. This approach has the simplicity advantage, but if you have control over F, rewriting it with a bit of care to vectorize correctly can have huge speed benefits

import numpy

n1 = 100
n2 = 200

a = numpy.arange(n1)
b = numpy.arange(n2)

def F(x, y):
    return x + y

# Everything above this is setup, the answer to your question lies here:
fv = numpy.vectorize(F)
r = fv(a[:, numpy.newaxis], b)

On my computer, the following timings are found, showing the price you pay for "automatic" vectorisation:

%timeit fv(a[:, numpy.newaxis], b)
100 loops, best of 3: 3.58 ms per loop

%timeit F(a[:, numpy.newaxis], b)
10000 loops, best of 3: 38.3 µs per loop
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • I suggest this `In [5]: %timeit a[:, numpy.newaxis] + b` `49.9 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)` – Beatriz Fonseca May 31 '18 at 18:15
  • Beautiful! Very generic answer which deserves a clear question in its own right. A lot of this is about language - broadcasting, outer product, reduce, correlator with arbitrary function, .... Are there any circumstances in which your method would fail? Can np.vectorize work on an arbitrary function/lambda? And if not? – jtlz2 Nov 10 '20 at 10:50
  • 1
    @jtlz2 `numpy.vectorise` works on all functions since it effectively works as a for loop (see the note in [the docs](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html)). So vectorise is a quick way to avoid writing the for loop yourself, but as I show it doesn't actualy makes the code run any faster. To do that you'd need to use one of the other techniques you mention. – chthonicdaemon Nov 10 '20 at 12:42
3

If F() works with broadcast arguments, definitely use that, as others describe.
An alternative is to use np.fromfunction (function_on_an_int_grid would be a better name.) The following just maps the int grid to your a-b grid, then into F():

import numpy as np

def func_allpairs( F, a, b ):
    """ -> array len(a) x len(b):
        [[ F( a0 b0 )  F( a0 b1 ) ... ]
         [ F( a1 b0 )  F( a1 b1 ) ... ]
         ...
        ]
    """
    def fab( i, j ):
        return F( a[i], b[j] )  # F scalar or vec, e.g. gradient

    return np.fromfunction( fab, (len(a), len(b)), dtype=int )  # -> fab( all pairs )


#...............................................................................
def F( x, y ):
    return x + 10*y

a = np.arange( 100 )
b = np.arange( 222 )
A = func_allpairs( F, a, b )
# %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
denis
  • 21,378
  • 10
  • 65
  • 88
2

You could use list comprehensions to create an array of arrays:

import numpy as np

# Arrays
a = np.array([1, 2, 3]) # n1 = 3
b = np.array([4, 5]) # n2 = 2

# Your function (just an example)
def f(i, j):
    return i + j

result = np.array([[f(i, j)for j in b ]for i in a])
print result

Output:

[[5 6]
 [6 7]
 [7 8]]
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Christian Tapia
  • 33,620
  • 7
  • 56
  • 73
2

As another alternative that's a bit more extensible than the dot-product, in less than 1/5th - 1/9th the time of nested list comprehensions, use numpy.newaxis (took a bit more digging to find):

>>> import numpy
>>> a = numpy.array([0,1,2])
>>> b = numpy.array([0,1,2,3])

This time, using the power function:

>>> pow(a[:,numpy.newaxis], b)
array([[1, 0, 0, 0],
       [1, 1, 1, 1],
       [1, 2, 4, 8]])

Compared with an alternative:

>>> numpy.array([[pow(i,j) for j in b] for i in a])
array([[1, 0, 0, 0],
       [1, 1, 1, 1],
       [1, 2, 4, 8]])

And comparing the timing:

>>> import timeit
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.943181037902832
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
5.985810041427612

>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
109.74687385559082
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
11.989138126373291
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
1

May I suggest, if your use-case is more limited to products, that you use the outer-product?

e.g.:

import numpy

a = array([0, 1, 2])
b = array([0, 1, 2, 3])

numpy.outer(a,b)

returns

array([[0, 0, 0, 0],
       [0, 1, 2, 3],
       [0, 2, 4, 6]])

You can then apply other transformations:

numpy.outer(a,b) + 1

returns

array([[1, 1, 1, 1],
       [1, 2, 3, 4],
       [1, 3, 5, 7]])

This is much faster:

>>> import timeit
>>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.79583477973938

>>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
9.351550102233887
>>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
12.308301210403442
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
  • 2
    interesting, but does this not assume that the function always takes the product of the two input values? What if you wanted a more complicated function? – Misconstruction Jan 22 '14 at 01:18
  • That's fair, but this is an incredibly common operation, relative to others you might want. If you want a more complicated function, may I suggest my other answer here: http://stackoverflow.com/questions/21226610/numpy-quirk-apply-function-to-all-pairs-of-two-1d-arrays-to-get-one-2d-array/21273998#21273998 If I may be slightly presumptuous, would you be willing to share your mystery function? – Russia Must Remove Putin Jan 22 '14 at 04:21