Identify first non-zero element within group composed of multiple columns in pandas

Question

I have a dataframe that looks like the following. The rightmost column is my desired column:

Group1  Group2  Value   Target_Column
1        3         0      0
1        3         1      1
1        4         1      1
1        4         1      0
2        5         5      5
2        5         1      0
2        6         0      0
2        6         1      1
2        6         9      0

How do I identify the first non-zero value in a group that is made up of two columns(Group1 & Group2) and then create a column that shows the first non-zero value and shows all else as zeroes?

This question is very similar to one posed earlier here: Identify first non-zero element within a group in pandas but that solution gives an error on groups based on multiple columns.

I have tried:

import pandas as pd
dt = pd.DataFrame({'Group1': [1,1,1,1,2,2,2,2,2], 'Group2': [3,3,4,4,5,5,6,6,6],  'Value': [0,1,1,1,5,1,0,1,9]})
dt['Newcol']=0
dt.loc[dt.Value.ne(0).groupby(dt['Group1','Group2']).idxmax(),'Newcol']=dt.Value

Why are there multiple 1s for some groups? – user3483203 Sep 01 '18 at 19:47 — user3483203, Sep 01 '18 at 19:47
@user3483203 thank you for catching it, I edited it. – gibbz00 Sep 01 '18 at 19:50 — gibbz00, Sep 01 '18 at 19:50

user3483203 · Accepted Answer · 2018-09-01T20:30:01.483

1

Setup

df['flag'] = df.Value.ne(0)

Using `numpy.where` and `assign`:

df.assign(
    target=np.where(df.index.isin(df.groupby(['Group1', 'Group2']).flag.idxmax()),
    df.Value, 0)
).drop('flag', 1)

Using `loc` and `assign`

df.assign(
    target=df.loc[df.groupby(['Group1', 'Group2']).flag.idxmax(), 'Value']
).fillna(0).astype(int).drop('flag', 1)

Both produce:

   Group1  Group2  Value  target
0       1       3      0       0
1       1       3      1       1
2       1       4      1       1
3       1       4      1       0
4       2       5      5       5
5       2       5      1       0
6       2       6      0       0
7       2       6      1       1
8       2       6      9       0

edited Sep 01 '18 at 20:30

answered Sep 01 '18 at 19:55

user3483203

50,081
9
65
94

Thank you for the solution. My goal is to find the first occurrence of non-zero element whereas `idxmax` finds the index of max value. I have added one more row to better illustrate the point. Even though 9 is max, first non-zero element 1 appears before it. – gibbz00 Sep 01 '18 at 20:23
@gibbz00 I added an additional setup step where you create a `flag` column. Try that, then try my approaches – user3483203 Sep 01 '18 at 20:30

BENY · Answer 2 · 2018-09-01T20:34:48.963

The number may off, since when there are only have two same values, I do not know you need the which one.

Using user3483203 's setting up

df['flag'] = df.Value.ne(0)

df['Target']=df.sort_values(['flag'],ascending=False).drop_duplicates(['Group1','Group2']).Value

df['Target'].fillna(0,inplace=True)
df
Out[20]: 
   Group1  Group2  Value  Target_Column  Target
0       1       3      0              0     0.0
1       1       3      1              1     1.0
2       1       4      1              1     1.0
3       1       4      1              0     0.0
4       2       5      5              5     5.0
5       2       5      1              0     0.0
6       2       6      0              0     0.0
7       2       6      1              1     1.0

Thank you for the solution. My goal is to find the first occurrence of non-zero element . I have added one more row to better illustrate the point. Even though 9 is max, first non-zero element 1 appears before it. Thereby that row gets the `df.Value` of 1 and everything else is 0. — gibbz00, Sep 01 '18 at 20:26

Identify first non-zero element within group composed of multiple columns in pandas

2 Answers2

Using numpy.where and assign:

Using loc and assign

Using `numpy.where` and `assign`:

Using `loc` and `assign`