Example dataset:
import pandas as pd
df_test = pd.DataFrame({
'a': ['orange', 'lemon', 'banana', 'orange'],
'b': ['person_a', 'person_a', 'person_b', 'person_b']
})
This gives:
a b
0 orange person_a
1 lemon person_a
2 banana person_b
3 orange person_b
I want to collapse this so that each of person_a
and person_b
is just one row, and the fruits form a list for each person:
a b
0 ['orange', 'lemon'] person_a
1 ['banana', 'orange'] person_b
How? I can put something equivalent together crudely with for loops but it feels hacky, and it's very slow. My gut suggests there should be something more native to pandas
.
EDIT: answer here: grouping rows in list in pandas groupby