19

Is it possible to construct a numpy matrix from a function? In this case specifically the function is the absolute difference of two vectors: S[i,j] = abs(A[i] - B[j]). A minimal working example that uses regular python:

import numpy as np

A = np.array([1,3,6])
B = np.array([2,4,6])
S = np.zeros((3,3))

for i,x in enumerate(A):
    for j,y in enumerate(B):
        S[i,j] = abs(x-y)

Giving:

[[ 1.  3.  5.]
 [ 1.  1.  3.]
 [ 4.  2.  0.]]

It would be nice to have a construction that looks something like:

def build_matrix(shape, input_function, *args)

where I can pass an input function with it's arguments and retain the speed advantage of numpy.

Hooked
  • 84,485
  • 43
  • 192
  • 261
  • This is possible. What have you tried? – Marcin Mar 14 '12 at 15:13
  • @Marcin - as stated in the question, I'm using a plain old python approach to populate the matrix right now. Looking over the docs of numpy suggest that the function `vectorize` might be of use, but I still didn't see how to construct the matrix from the function in the first place. If you could point me in the right direction (documentation-wise) I'd appreciate it! – Hooked Mar 14 '12 at 15:17
  • This should be possible in plain python. What have you tried to create your build_matrix function? Surely you have something, and are stuck somewhere, rather than hoping that someone will write it all for you. – Marcin Mar 14 '12 at 15:23
  • 2
    @marcin I think you misunderstand the question. I _have_ a working solution, one that doesn't use `numpy` (which was posted). I've _looked_ in the documentation to find a function that creates a numpy array from arguments and _haven't found it_. I'm _looking_ for a native numpy call so that I can write my own `build_matrix` function, rather than "have someone write it all for me". The issue is speed that is gained when working with numpy's internal vectorized representation. – Hooked Mar 14 '12 at 15:42

2 Answers2

23

In addition to what @JoshAdel has suggested, you can also use the outer method of any numpy ufunc to do the broadcasting in the case of two arrays.

In this case, you just want np.subtract.outer(A, B) (Or, rather, the absolute value of it).

While either one is fairly readable for this example, in some cases broadcasting is more useful, while in others using ufunc methods is cleaner.

Either way, it's useful to know both tricks.

E.g.

import numpy as np

A = np.array([1,3,6])
B = np.array([2,4,6])

diff = np.subtract.outer(A, B)
result = np.abs(diff)

Basically, you can use outer, accumulate, reduce, and reduceat with any numpy ufunc such as subtract, multiply, divide, or even things like logical_and, etc.

For example, np.cumsum is equivalent to np.add.accumulate. This means you could implement something like a cumdiv by np.divide.accumulate if you even needed to.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • Thanks @JoeKington, do you know offhand if there is any internal difference between the broadcasting method and the outer method? For practicality, I don't notice any speed difference for small tests on my machine, so I guess I can use either one. – Hooked Mar 14 '12 at 15:50
  • @JoeKington +1 Definitely another great set of numpy tricks to keep up one's sleeve. In someways I actually like this approach better since I find the syntax to be more descriptive. – JoshAdel Mar 14 '12 at 16:02
  • 1
    @talonmies On my machine for arrays with 10,100,1000 and 10000 elements, the broadcasting and outer methods have pretty much the same timings, with the broadcast method winning by a small amount. – JoshAdel Mar 14 '12 at 16:19
19

I recommend taking a look into numpy's broadcasting capabilities:

In [6]: np.abs(A[:,np.newaxis] - B)
Out[6]: 
array([[1, 3, 5],
       [1, 1, 3],
       [4, 2, 0]])

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Then you could simply write your function as:

In [7]: def build_matrix(func,args):
   ...:     return func(*args)
   ...: 

In [8]: def f1(A,B):
   ...:     return np.abs(A[:,np.newaxis] - B)
   ...: 

In [9]: build_matrix(f1,(A,B))
Out[9]: 
array([[1, 3, 5],
       [1, 1, 3],
       [4, 2, 0]])

This should also be considerably faster than your solution for larger arrays.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • This is perfect and has a _considerable_ gain for large N. I'll look more into broadcasting, thank you. – Hooked Mar 14 '12 at 15:46
  • 2
    To avoid creating an intermediate array that holds the difference [`numpexpr`](https://code.google.com/p/numexpr/) could be used: `c = a[:,None]; result = numexpr.evaluate("abs(c - b)")` – jfs Mar 14 '12 at 20:18
  • 1
    @J.F.Sebastian - For whatever it's worth, you can also avoid it by taking the absolute value in-place if you don't have `numexpr` installed. It's slightly more verbose, though: `c = a[:,None]; result = c - b; np.abs(result, result)` – Joe Kington Mar 14 '12 at 20:35