Find first non-zero value in each column of pandas DataFrame

Question

What is a pandoric way to get a value and index of the first non-zero element in each column of a DataFrame (top to bottom)?

import pandas as pd

df = pd.DataFrame([[0, 0, 0],
                   [0, 10, 0],
                   [4, 0, 0],
                   [1, 2, 3]],
                  columns=['first', 'second', 'third'])

print(df.head())

#    first  second  third
# 0      0       0      0
# 1      0      10      0
# 2      4       0      0
# 3      1       2      3

What I would like to achieve:

#        value  pos
# first      4    2
# second    10    1
# third      1    3

piRSquared · Accepted Answer · 2018-05-29T14:13:56.827

20

You're looking for idxmax which gives you the first position of the maximum. However, you need to find the max of "not equal to zero"

df.ne(0).idxmax()

first     2
second    1
third     3
dtype: int64

We can couple this with lookup and assign

df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))

        pos  val
first     2    4
second    1   10
third     3    3

Same answer packaged slightly differently.

m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))

        pos  val
first     2    4
second    1   10
third     3    3

edited May 29 '18 at 14:13

answered May 29 '18 at 13:59

piRSquared

285,575
57
475
624

idxmax() will give you the maximum value which happen to be the first value of the columns (top-down) in this case. What would be the best way to detect the first non-zero value? – Meruemu Feb 20 '20 at 15:33
Doesn't really work if all values are zero. In that case, you'll get the same answer as when the first value is non-zero (pos=0). – shaneb May 13 '20 at 15:40
If all values are zero then zero is the maximum. – piRSquared May 13 '20 at 15:47

score 4 · Answer 2 · answered May 29 '18 at 14:01

Here's the longwinded way, which should be faster if your non-zero values tend to occur near the start of large arrays:

import pandas as pd

df = pd.DataFrame([[0, 0, 0],[0, 10, 0],[4, 0, 0],[1, 2, 3]],
                  columns=['first', 'second', 'third'])

res = [next(((j, i) for i, j in enumerate(df[col]) if j != 0), (0, 0)) for col in df]

df_res = pd.DataFrame(res, columns=['value', 'position'], index=df.columns)

print(df_res)

        value  position
first       4         2
second     10         1
third       3         3

score 3 · Answer 3 · answered May 29 '18 at 14:05

3

I will using stack , index is for row and column number

df[df.eq(df.max(1),0)&df.ne(0)].stack()
Out[252]: 
1  second    10.0
2  first      4.0
3  third      3.0
dtype: float64

answered May 29 '18 at 14:05

BENY

317,841
20
164
234

score 1 · Answer 4 · answered Feb 06 '22 at 02:19

You can also use Numpy's nonzero function for this.

positions = [df[col].to_numpy().nonzero()[0][0] for col in df]
df_res = pd.DataFrame({'value': df.to_numpy()[(positions, range(3))], 
                       'position': positions}, index=df.columns)
print(df_res)

        value  position
first       4         2
second     10         1
third       3         3

Find first non-zero value in each column of pandas DataFrame

4 Answers4

Linked