Merging Two Rows (one with a value, the other NaN) in Pandas

Question

I am aware similar questions have been asked before (How to merge two rows in a dataframe pandas, etc), but I am still struggling to do the following (except with pandas dataframe with many rows):

    team_token  day1    day2   day3  day4
0   abc          1      NaN     NaN   NaN
1   abc          NaN     1      NaN   NaN
2   abc          NaN     NaN    NaN    NaN
3   abc          NaN     NaN    NaN     1

I want to combine the rows with the same team_token so that the end result looks like:

    team_token  day1    day2  day3  day4
0   abc           1      1    NaN     1

Thank you in advance.

put your code, what you have tried yet @cgp25 – Sanket Jun 28 '17 at 09:56 — Sanket, Jun 28 '17 at 09:56
I tried df.combine_first, but to no avail – cgp25 Jun 28 '17 at 09:59 — cgp25, Jun 28 '17 at 09:59
you want to remove NaN values? – Hardik Sondagar Jun 28 '17 at 10:02 — Hardik Sondagar, Jun 28 '17 at 10:02
No, they can remain @HardikSondagar – cgp25 Jun 28 '17 at 10:51 — cgp25, Jun 28 '17 at 10:51

score 8 · Accepted Answer · edited Mar 21 '22 at 20:28

Use combine_first on two Series given by the two rows of you dataframe:

import pandas as pd
df = pd.DataFrame({'team_token':['abc', 'abc'], 'day1': [1, None], 'day2' : [None, 1]})

df.loc[0].combine_first(df.loc[1])

gives:

 team_token  day1    day2
0   abc           1       1

#EDIT: A better solution that works also with your updated answer is simply:

df.max()

assuming that your day columns contain either Nan or ones.

In case you have different team_tokens:

   day1  day2  day3 team_token
0   1.0   NaN   NaN        abc
1   NaN   NaN   NaN        abc
2   1.0   1.0   NaN        abc
3   NaN   NaN   NaN        abc
4   1.0   NaN   1.0        ddd
5   1.0   NaN   1.0        ddd
6   NaN   NaN   1.0        ddd

you can group_by and still take the max of the group:

df.groupby('team_token').max().reset_index()

#  team_token  day1  day2  day3
#0        abc   1.0   1.0   NaN
#1        ddd   1.0   NaN   1.0

I will try! I realize now my 1s are actually strings, so will fix and try the above, which should work! Thanks! — cgp25, Jun 28 '17 at 11:11

score 0 · Answer 2 · answered Jun 28 '17 at 10:24

0

If there is only one non-NaN value for each team_token, you can use groupby/sum

df.groupby('team_token')['day1', 'day2'].sum().reset_index()

You get

    team_token  day1    day2
0   abc         1.0     1.0

answered Jun 28 '17 at 10:24

Vaishali

37,545
5
58
86

score 0 · Answer 3 · answered Jul 03 '19 at 14:26

Maybe, not the shortest solution, but imho more flexible.

import pandas as pd
import numpy as np

df = pd.DataFrame({'team_token':['abc', 'abc', 'abc', 'abc'], 
                   'day1': [1, None, None, None], 
                   'day2': [None, 1, None, None],
                   'day3': [None, None, None, None],
                   'day4': [None, None, None, 1]})
df

def replace_nan_func(x):
    x = x[~pd.isna(x)]
    if len(x) > 0:
        return x.iloc[0]
    else:
        return np.NaN

df_valid = df.groupby(by='team_token').agg(dict.fromkeys(df.columns[1:], replace_nan_func))
df_valid

Merging Two Rows (one with a value, the other NaN) in Pandas

3 Answers3

Linked