5

I am aware similar questions have been asked before (How to merge two rows in a dataframe pandas, etc), but I am still struggling to do the following (except with pandas dataframe with many rows):

    team_token  day1    day2   day3  day4
0   abc          1      NaN     NaN   NaN
1   abc          NaN     1      NaN   NaN
2   abc          NaN     NaN    NaN    NaN
3   abc          NaN     NaN    NaN     1

I want to combine the rows with the same team_token so that the end result looks like:

    team_token  day1    day2  day3  day4
0   abc           1      1    NaN     1

Thank you in advance.

cgp25
  • 335
  • 1
  • 5
  • 15

3 Answers3

8

Use combine_first on two Series given by the two rows of you dataframe:

import pandas as pd
df = pd.DataFrame({'team_token':['abc', 'abc'], 'day1': [1, None], 'day2' : [None, 1]})

df.loc[0].combine_first(df.loc[1])

gives:

 team_token  day1    day2
0   abc           1       1 

#EDIT: A better solution that works also with your updated answer is simply:

df.max()

assuming that your day columns contain either Nan or ones.

In case you have different team_tokens:

   day1  day2  day3 team_token
0   1.0   NaN   NaN        abc
1   NaN   NaN   NaN        abc
2   1.0   1.0   NaN        abc
3   NaN   NaN   NaN        abc
4   1.0   NaN   1.0        ddd
5   1.0   NaN   1.0        ddd
6   NaN   NaN   1.0        ddd

you can group_by and still take the max of the group:

df.groupby('team_token').max().reset_index()

#  team_token  day1  day2  day3
#0        abc   1.0   1.0   NaN
#1        ddd   1.0   NaN   1.0
Wtower
  • 18,848
  • 11
  • 103
  • 80
VinceP
  • 2,058
  • 2
  • 19
  • 29
0

If there is only one non-NaN value for each team_token, you can use groupby/sum

df.groupby('team_token')['day1', 'day2'].sum().reset_index()

You get

    team_token  day1    day2
0   abc         1.0     1.0
Vaishali
  • 37,545
  • 5
  • 58
  • 86
0

Maybe, not the shortest solution, but imho more flexible.

import pandas as pd
import numpy as np

df = pd.DataFrame({'team_token':['abc', 'abc', 'abc', 'abc'], 
                   'day1': [1, None, None, None], 
                   'day2': [None, 1, None, None],
                   'day3': [None, None, None, None],
                   'day4': [None, None, None, 1]})
df

enter image description here

def replace_nan_func(x):
    x = x[~pd.isna(x)]
    if len(x) > 0:
        return x.iloc[0]
    else:
        return np.NaN

df_valid = df.groupby(by='team_token').agg(dict.fromkeys(df.columns[1:], replace_nan_func))
df_valid

enter image description here

Kirill Dolmatov
  • 327
  • 5
  • 11