68

I want to be able to iterate over the matrix to apply a function to each row. How can I do it for a Numpy matrix ?

erogol
  • 13,156
  • 33
  • 101
  • 155
  • 2
    It is likely that you will get more helpful answers if you explain what you are trying to achieve / what kind of function to apply. Also, you may want to have a look at: http://stackoverflow.com/questions/8079061/function-application-over-numpys-matrix-row-column – root May 09 '13 at 18:42
  • 2
    please post your code. If you haven't tried to do it yet, go try some stuff and post what problems you have – Ryan Saxe May 09 '13 at 18:49

3 Answers3

88

You can use numpy.apply_along_axis(). Assuming that your array is 2D, you can use it like:

import numpy as np

myarray = np.array([[11, 12, 13],
                    [21, 22, 23],
                    [31, 32, 33]])
def myfunction(x):
    return x[0] + x[1]**2 + x[2]**3

print(np.apply_along_axis(myfunction, axis=1, arr=myarray))
#[ 2352 12672 36992]
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • 10
    if you are using `numpy` functions you can (usually) just specify the axis, like: `mymatrix.sum(axis=1)`. – root May 09 '13 at 19:03
  • 1
    that's right, the sum() in myfunction was just an example, but for some cases, like [here](http://stackoverflow.com/questions/15094619/fitting-a-3d-array-of-data-to-a-1d-function-with-numpy-or-scipy/16315330#16315330), `np.apply_along_axis()` can be very useful – Saullo G. P. Castro May 09 '13 at 19:06
  • 1
    It can be, yes - not knowing the function makes the question ambiguous. – root May 09 '13 at 19:11
  • 32
    The problem is that `apply_along_axis` is a Python for loop in disguise. It can give the illusion of numpy performance, but it will not deliver it. In the question you link, using `apply_along_axis` has no benefit over using a for loop. Trying to vectorize whatever function you want to apply to every row is the numpythonic way of doing things. – Jaime May 09 '13 at 19:34
72

While you should certainly provide more information, if you are trying to go through each row, you can just iterate with a for loop:

import numpy
m = numpy.ones((3,5),dtype='int')
for row in m:
  print str(row)
Noel Evans
  • 8,113
  • 8
  • 48
  • 58
matthew-parlette
  • 1,234
  • 9
  • 5
  • 6
    Isn't this an inefficient implementation? – Lokesh Mar 28 '17 at 17:59
  • 4
    @Lokesh, why is that? – Brendan Jul 21 '18 at 04:44
  • 2
    @Brendan It's pretty late, but looping on numpy array is usually expensive because python interpreter and numpy code have to exchange the data every time the loop is executed. – sohnryang Jun 07 '20 at 06:00
  • 2
    @sohnryang, thanks for the response. In the past I've iterated over numpy arrays via indices (e.g. `for i in range(m)`), and that hasn't been a performance bottleneck in my experience up to 100k iterations or so. This thread seems to indicate that the assignment of each row to the `row` variable may be the slow part, so index-based iteration may be the way to go here rather than variable assignment: https://stackoverflow.com/questions/39371021/efficient-loop-over-numpy-array – Brendan Jun 08 '20 at 17:56
  • I think it will produce a wrong answer in case there is a vector as an input. – Royi Aug 24 '22 at 18:38
8

Here's my take if you want to try using multiprocesses to process each row of numpy array,

from multiprocessing import Pool
import numpy as np

def my_function(x):
    pass     # do something and return something

if __name__ == '__main__':    
    X = np.arange(6).reshape((3,2))
    pool = Pool(processes = 4)
    results = pool.map(my_function, map(lambda x: x, X))
    pool.close()
    pool.join()

pool.map take in a function and an iterable.
I used 'map' function to create an iterator over each rows of the array.
Maybe there's a better to create the iterable though.

hamster ham
  • 729
  • 8
  • 8