Python: Groupby First Non NaN Value

Question

I have the following dataframe:

id    number   
1      13
1      13
1      NaN
1      NaN
2      11
2      11
2      11
2      NaN

I want to find the first non-NaN value per id and mark it with a 1. The result should look like this:

id    number     code   
1      13         NaN
1      13         1
1      NaN        NaN
1      NaN        NaN
2      11         NaN
2      11         NaN
2      11         1
2      NaN        NaN

I tried the following command and then go from there:

df["test"] = df.groupby("id")["number"].first_valid_index()

It gives me the following error: Cannot access callable attribute 'first_valid_index' of 'SeriesGroupBy' objects, try using the 'apply' method

Then I tried this:

df['test'] = df.groupby("id")['number'].apply(lambda x: x.first_valid_index())

But this gives me just a column of Nats...

Does anybody know how the problem could be solved efficiently?

That looks like the last valid index? – cs95 Feb 04 '18 at 22:11 — cs95, Feb 04 '18 at 22:11

score 5 · Accepted Answer · answered Feb 04 '18 at 22:13

Assuming you mean last_valid_index, you can apply the last_valid_index function and loc to assign -

df.loc[df.groupby('id').number.apply(pd.Series.last_valid_index), 'code'] = 1
df

   id  number  code
0   1    13.0   NaN
1   1    13.0   1.0
2   1     NaN   NaN
3   1     NaN   NaN
4   2    11.0   NaN
5   2    11.0   NaN
6   2    11.0   1.0
7   2     NaN   NaN

Alternatively, using groupby + idxmax -

df.loc[df.number.notnull().cumsum().groupby(df.id).idxmax(), 'code'] = 1
df

   id  number  code
0   1    13.0   NaN
1   1    13.0   1.0
2   1     NaN   NaN
3   1     NaN   NaN
4   2    11.0   NaN
5   2    11.0   NaN
6   2    11.0   1.0
7   2     NaN   NaN

@BallpointBen I suppose it would spuriously assign a 1 to the last row in the group. — cs95, Aug 23 '18 at 18:27

BENY · Answer 2 · 2018-02-05T00:44:49.557

3

You can use this...

df.loc[df.groupby('id').apply(lambda x : x['number'].dropna()).reset_index(level=1)['level_1'].max(level=0),'code']=1
df
Out[628]: 
   id  number  code
0   1    13.0   NaN
1   1    13.0   1.0
2   1     NaN   NaN
3   1     NaN   NaN
4   2    11.0   NaN
5   2    11.0   NaN
6   2    11.0   1.0
7   2     NaN   NaN

edited Feb 05 '18 at 00:44

answered Feb 04 '18 at 23:17

BENY

317,841
20
164
234

@piRSquared yep, fix it :-) – BENY Feb 05 '18 at 00:44

Python: Groupby First Non NaN Value

2 Answers2

Linked