numpy: how to apply function to every row of array

Question

I have a 2d numpy array called my_data. Each row represents information about one data point and each column represents different attributes of that data point.

I have a function called processRow. It takes in a row, and does some processing on the info and returns the modified row. The length of the row returned by the function is longer than the row taken in by the function (the function basically expands some categorical data into one-hot vectors)

How can I have a numpy array where every row has been processed by this function?

I tried

answer = np.array([])
for row in my_data:
    answer = np.append(answer,processRow(row))

but at the end, the answer is just a single really long row rather than a 2d grid

This code can't be correct, it gives `AttributeError: 'numpy.ndarray' object has no attribute 'append'`. Please include an entire snippet that we can run to demonstrate the issue. — Andy Hayden, May 06 '18 at 21:33
List append is useful. np.append has too many boobytraps. Reread its docs. What does it say about the axis parameter? — hpaulj, May 06 '18 at 22:21
np.append like vstack is slow. Without axis it ravels the inputs. — hpaulj, May 06 '18 at 23:27

score 2 · Answer 1 · answered May 06 '18 at 21:51

2

You can use vstack rather since row has a different shape to answer. You also need to be explicit with the shape of answer:

In [11]: my_data = np.array([[1, 2], [3, 4]])
    ...: process_row = lambda x: x  # do nothing

In [12]: answer = np.empty((0, 2), dtype='int64')
    ...: for row in my_data:
    ...:     answer = np.vstack([answer, process_row(row)])
    ...:

In [13]: answer
Out[13]:
array([[ 1,  2],
       [ 3,  4]])

However, you're probably better off doing a list comprehension, and then passing it to numpy after:

In [21]: np.array([process_row(row) for row in my_data])
Out[21]:
array([[1, 2],
       [3, 4]])

answered May 06 '18 at 21:51

Andy Hayden

359,921
101
625
535

You may want to do this in cython, or pandas, to be more performant, but it's unclear what the best strategy is without more information on process_row. – Andy Hayden May 06 '18 at 21:52
I tried this but it was way too slow. It ran for 10 minutes before I quit it. I think `answer = np.vstack([answer, process_row(row)])` takes linear time because it copies `answer` each time, making it take O(n^2) time overall. @Aklys 's solution runs much faster so I'm accepting his solution – quantumbutterfly May 06 '18 at 22:32
@quantumbutterfly yes, his answer is the same as the one at the end of mine "you're better of a list comprehension" ... only more verbose. – Andy Hayden May 06 '18 at 22:35
Oh whoops, overlooked that part. Sorry about that – quantumbutterfly May 06 '18 at 22:38

score 1 · Accepted Answer · answered May 06 '18 at 22:00

I'm not sure if I entirely got what you were after without seeing a sample of the data. But hopefully this helps you get to the result you want. I simplified the concept and just added one to each value in the row passed to the function and added the results together for a total (just to expand the size of the returned array). Of course you could adjust the processing to whatever you wanted.

def funky(x):
    temp = []
    for value in x:
        value += 1
        temp.append(value)
    temp.append(temp[0] + temp[1])
    return np.array(temp)

my_data = np.array([[1,1], [2,2]]) 

answer = np.apply_along_axis(funky, 1, my_data)
print("This is the original data:\n{}".format(my_data))
print("This is the adjusted data:\n{}".format(answer))

Below is the before and after of the array modification:

This is the original data:
[[1 1]
 [2 2]]
This is the adjusted data:
[[2 2 4]
 [3 3 6]]

numpy: how to apply function to every row of array

2 Answers2