0

I have the initial df and I want to aggregate the 'combo' column into a unique string, separated by slashes, but respecting the order indicated in the sort. In desired data you can find my final target dataset


raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
        'order': [2,1,4,2,1,1,1],
           'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df


desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}

desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])

desired_data

#what I did until now

df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1

progster
  • 877
  • 3
  • 15
  • 27
  • The answer already exists here: [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/a/59284106/1609514). – Bill Apr 12 '22 at 15:43

1 Answers1

0

Here is one way:

combined_combo = df.groupby(['name', 'date'])['combo'].agg('/'.join).rename('combined_combo')
print(combined_combo)

Out:

name  date      
A     2017-03-31    z/y/x
      2017-04-04        x
B     2017-04-03      y/x
C     2017-04-04        x
Name: combined_combo, dtype: object

If you don't want the groups as the index use:

desired_data = combined_combo.reset_index()
Bill
  • 10,323
  • 10
  • 62
  • 85