Groupby lists in Pandas

Question

I have a dataframe:

df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})

ie:

        col0  col1
0     [0, 1]     5
1  [1, 0, 0]     4
2     [1, 0]     3
3     [1, 0]     2
4     [2, 0]     1

I would like to group by values in col0, and sum col1 values in the same group. I do:

df.groupby('col0').col1.sum()

but this gives TypeError: unhashable type: 'list'. I do then:

df.groupby(df.col0.apply(frozenset)).col1.sum()

which gives:

col0
(0, 1)    14
(0, 2)     1
Name: col1, dtype: int64

Ie lists were converted into sets (frozensets to be exact), and then groupbyed. The number of elements and order of them did not matter (ie [1,0] and [0,1] belongs to the same group, so does [1,0] and [1,0,0])

If order and number of elements also matter, how do I groupby then?

Desired output of groupbying col0 and summing col1 of above dataframe:

col0
[0, 1]     5
[1,0,0]    4
[1, 0]     5
[2,0]      1
Name: col1, dtype: int64

@mozway That's a very enlightening way of putting it, thanks. — zabop, Jan 06 '22 at 13:18
@mozway What is for dicts which is frozenset for set & tuple for list? — zabop, Jan 19 '22 at 10:40
I don't think it exists, [see here](https://stackoverflow.com/questions/2703599/what-would-a-frozen-dict-be), but usually what matters most are the keys. So I guess a frozenset/tuple of keys, or of key/value pairs ;) — mozway, Jan 19 '22 at 11:07

Ch3steR · Accepted Answer · 2022-01-06T13:13:19.303

2

tuple is immutable, can contain duplicates and maintains the order.

df['col0'] = df['col0'].apply(tuple)
df.groupby('col0', sort=False).sum() # sort=False for original order of col0 
#            col1
# col0           
# (0, 1)        5
# (1, 0, 0)     4
# (1, 0)        5
# (2, 0)        1

edited Jan 06 '22 at 13:13

answered Jan 06 '22 at 13:10

Ch3steR

20,090
4
28
58

2

That would be my answer as well. After grouping you can easily revert back to list if needed – Daniel Wlazło Jan 06 '22 at 13:10

score 1 · Answer 2 · answered Jan 06 '22 at 13:08

1

You can convert to string just for grouping:

import pandas as pd
df = pd.DataFrame({'col0':[[0,1],[1,0,0],[1,0],[1,0],[2,0]],
                   'col1':[5,4,3,2,1]})
df.groupby(df['col0'].astype(str)).sum()

answered Jan 06 '22 at 13:08

OnY

897
6
12

Groupby lists in Pandas

2 Answers2