2

I have a table which looks like this -

date        name        values      items
01-03-2019  xyz         900         NaN
NaN         NaN         900         brush
02-03-2019  abc        1200         NaN
NaN         NaN         900         paste
NaN         NaN         300         floss

and desired output is -

date            name       values        items
01-03-2019      xyz        900           brush
02-03-2019      abc        1200          paste, floss

I know I can use np.where() to make the values of each item NaN, but I don't know how I should go about the concatenation and then move the result up alongside the name.

rik_maz
  • 69
  • 5
  • Possible duplicate of [Python Pandas: Groupby Sum AND Concatenate Strings](https://stackoverflow.com/questions/47600818/python-pandas-groupby-sum-and-concatenate-strings) – Ankur Sinha Apr 23 '19 at 09:09

2 Answers2

2

Use GroupBy.agg with first column and ffill for forward filling missing values and dictionary, which is created dynamic - for each column without date and items is applied aggregate function GroupBy.first and for last column items is used join with removed missing values:

d = dict.fromkeys(df.columns.difference(['date','items']), 'first')
d['items'] = lambda x: ', '.join(x.dropna())
df = df.groupby(df['date'].ffill()).agg(d).reset_index()
print (df)
         date name  values         items
0  01-03-2019  xyz     900         brush
1  02-03-2019  abc    1200  paste, floss

If only few columns is possible pass 2 columns to groupby with ffill and create dictionary passed to agg:

df = (df.groupby([df['date'].ffill(), df['name'].ffill()])
       .agg({'values':'first', 'items':lambda x: ', '.join(x.dropna())})
       .reset_index())
print (df)
         date name  values         items
0  01-03-2019  xyz     900         brush
1  02-03-2019  abc    1200  paste, floss
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

We can use fillna with forwardfill (ffill). Then dropna and at the same time groupby with agg to get the sum of value and concat the strings in items:

df[['date', 'name']] = df[['date', 'name']].fillna(method='ffill')

df = df.dropna().groupby('date').agg({'name':'first',
                                      'values':'sum',
                                      'items':', '.join}).reset_index()

print(df)
         date name  values         items
0  01-03-2019  xyz     900         brush
1  02-03-2019  abc    1200  paste, floss

Erfan
  • 40,971
  • 8
  • 66
  • 78