1

I have a Dataframe containg text and some value counts e.g.:

dates=['01-01-15','01-01-15','01-01-15','02-01-15','02-01-15','02-01-15','02-01-15']
df3 = pd.DataFrame({'Number':['001','001','001','002','002','002','002'],
                   'name':['peter','chris','meg','albert','cathrine','leo','leo'],
                   'dummy':[0,1,0,0,0,1,1],
                   'dates': dates})
df3.dates=pd.to_datetime(df3.dates)

Now i want to group it by the "Number" column and sum, so that the values will be summed and the text will become one list with all entries, and each group will have one date.

If i use df4=pd.DataFrame(df3.groupby('Number').sum()) it performs the necessary operations on the numeric data, but loses the text and date column.

So the output shold look like:

df4
Number      name                   dummy    dates
001     [peter,chris,meg]            1    01-01-15
002     [albert, cathrine, leo,leo]  2    02-01-15
user11638654
  • 305
  • 2
  • 12
  • Related: Named aggregations as described [here.](https://stackoverflow.com/a/54300159) – cs95 Jul 21 '19 at 22:33

1 Answers1

4

You want to use .agg to specify multiple aggregation functions, like list, sum and first:

df3.groupby('Number', as_index=False).agg({'name':list,
                                           'dummy':'sum',
                                           'dates':'first'})

Output

  Number                          name  dummy      dates
0    001           [peter, chris, meg]      1 2015-01-01
1    002  [albert, cathrine, leo, leo]      2 2015-02-01

Pandas version >= 0.25

Since verison 0.25 we are moving away from the dictionary and using tuples instead.

See docs

df3.groupby('Number', as_index=False).agg(name_list=('name',list),
                                          dummy_sum=('dummy','sum'),
                                          dates=('dates','first'))

Output

                      name_list  dummy_sum      dates
0           [peter, chris, meg]          1 2015-01-01
1  [albert, cathrine, leo, leo]          2 2015-02-01
cs95
  • 379,657
  • 97
  • 704
  • 746
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • ah this works perfectly, but do i have to specify the function for each single column? because on my real dataframe i have 30 columns for most of which i have to sum the data – user11638654 Jul 21 '19 at 19:35
  • 1
    No you dont have to, you can split your dataframe and only keep the `Number` column and all the rest which you want to sum and just use `df3.groupby('Number').sum()`. After that join it back with your categorical data. @user11638654. – Erfan Jul 21 '19 at 19:39