ndarry in numpy: expensive in ndarray iteration

Question

Here is an interesting observation:

1 import numpy as np
2 data = np.array([[255,255,255], [0, 0, 255], [255, 0, 0]], np.int8)
3 for i in range(1000000):
4    for row in data:
5        for col in row:
6            flag = col > 0

The above code takes ~17 seconds to finish. If I convert data to list by doing

data = data.tolist()

Then the whole thing only takes < 1 second to finish.

Would like to know: 1. What's the reason for the low efficiency in ndarray value comparison? 2. What's a more appropriate way to do the comparison if I don't convert the the ndarray to list? Would it be more efficient than if I covert it into list?

Thanks!

-------------- edited question: -------------

As @hpaulj pointed out, it's the iteration not value comparison that's very expensive. But I do need to iterate thru the array. Any better way than converting it to list?

It's not the value comparison that's expensive, it's the iteration. You are supposed to apply the comparison to the whole array. — hpaulj, Dec 18 '16 at 16:51
I found a bug: np.int8 type cannot store 255. It's a signed type which can only store -128~127. So your data is `np.array([[-1,-1,-1], [0, 0, -1], [-1, 0, 0]], np.int8)` eventually. — gzc, Dec 18 '16 at 16:57
@hpaulj Ya. Agreed. Just figured. But I do need to iterate the whole ndarray. Any better way? — cheng, Dec 18 '16 at 17:04

score 1 · Answer 1 · answered Dec 18 '16 at 16:51

1

A more appropriate and effecient way is to use numpy element-wise comparison.

for i in range(1000000):
    flag = data > 0

answered Dec 18 '16 at 16:51

gzc

8,180
8
42
62

score 0 · Accepted Answer · edited May 23 '17 at 12:00

From the side bar: Why is a `for` over a Python list faster than over a Numpy array?

The question of how to make iteration over an array faster comes up often - and best answer is "don't", or rather, push the iteration onto compiled numpy code. There's no way to make explicit iteration at the Python level significantly faster. Some tricks may yield a 2x speedup, but not an order of magnitude.

So in your case, the answer is it depends on what you are doing at each iteration. As gzc's answer shows, you can perform the comparison, element by element, with one numpy expression. You don't need to iterate to perform that action.

ndarry in numpy: expensive in ndarray iteration

2 Answers2