Your next
expression works:
In [793]: [next((i for i,x in enumerate(row) if x),None) for row in np.eye(10)]
Out[793]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
OK, that gives the index of the first nonzero, but in my sample case that's more interesting that the 1
value.
In [801]: [row.nonzero()[0][0] for row in np.eye(10)]
Out[801]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But if the array has a row with all 0s, such as in
arr =np.diag(np.arange(0,20,2))
the nonzero
version raises an error. It needs to be sensitive to the case where nonzero
returns an empty list.
To get values from the idx
list use
arr[np.arange(len(idx)), idx]
timings
for a large diagonal array, the nonzero
is substantially faster:
In [822]: arr =np.diag(np.arange(1,2000,2))
In [823]: timeit idx = [next((i for i,x in enumerate(row) if x),None) for row in arr]
10 loops, best of 3: 87.6 ms per loop
In [824]: timeit [row.nonzero()[0][0] for row in arr]
100 loops, best of 3: 6.44 ms per loop
for same size array with all the 1s early in the row, the next
approach is somewhat faster.
In [825]: arr = np.zeros_like(arr,int)
In [826]: arr[:,10]=1
In [827]: timeit idx = [next((i for i,x in enumerate(row) if x),None) for row in arr]
100 loops, best of 3: 3.61 ms per loop
In [828]: timeit [row.nonzero()[0][0] for row in arr]
100 loops, best of 3: 6.41 ms per loop
There's trade off between short circuiting looping in Python v full looping in C code.
argmax
is another way of finding the first nonzero index in each row:
idx = np.argmax(arr>0, axis=1)
With an axis parameter argmax
has to iterate by row, and then within the row, but it does so in compiled code. With a boolean argument like this, argmax
does short circuit. I've explored this in another question about argmax
(or min) and nan
values, which also short circuit.
https://stackoverflow.com/a/41324751/901925
Another possibility (channeling @Divakar
? )
def foo(arr):
I,J=np.where(arr>0)
u,i=np.unique(I,return_index=True)
return J[i]