1

I have a dataframe:

df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})

ie:

        col0  col1
0     [0, 1]     5
1  [1, 0, 0]     4
2     [1, 0]     3
3     [1, 0]     2
4     [2, 0]     1

I would like to group by values in col0, and sum col1 values in the same group. I do:

df.groupby('col0').col1.sum()

but this gives TypeError: unhashable type: 'list'. I do then:

df.groupby(df.col0.apply(frozenset)).col1.sum()

which gives:

col0
(0, 1)    14
(0, 2)     1
Name: col1, dtype: int64

Ie lists were converted into sets (frozensets to be exact), and then groupbyed. The number of elements and order of them did not matter (ie [1,0] and [0,1] belongs to the same group, so does [1,0] and [1,0,0])

If order and number of elements also matter, how do I groupby then?

Desired output of groupbying col0 and summing col1 of above dataframe:

col0
[0, 1]     5
[1,0,0]    4
[1, 0]     5
[2,0]      1
Name: col1, dtype: int64
zabop
  • 6,750
  • 3
  • 39
  • 84

2 Answers2

2

tuple is immutable, can contain duplicates and maintains the order.

df['col0'] = df['col0'].apply(tuple)
df.groupby('col0', sort=False).sum() # sort=False for original order of col0 
#            col1
# col0           
# (0, 1)        5
# (1, 0, 0)     4
# (1, 0)        5
# (2, 0)        1
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
1

You can convert to string just for grouping:

import pandas as pd
df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})
df.groupby(df['col0'].astype(str)).sum()
OnY
  • 897
  • 6
  • 12