pandas dataframe - group artists per unique user

Question

To avoid duplicates of the same user, I want to neatly organize a nested dictionary of {k: artist1, artist2, artist3, etc} using pandas groupby function. Here is sample data (my instinct tells me chain an agg func?)

...like df.groupby('users')?

    users                                       artist
0   00001411dc427966b17297bf4d69e7e193135d89    the most serene republic
1   00001411dc427966b17297bf4d69e7e193135d89    stars
2   00001411dc427966b17297bf4d69e7e193135d89    broken social scene
3   00001411dc427966b17297bf4d69e7e193135d89    have heart
4   00001411dc427966b17297bf4d69e7e193135d89    luminous orange
5   00001411dc427966b17297bf4d69e7e193135d89    boris
6   00001411dc427966b17297bf4d69e7e193135d89    arctic monkeys
7   00001411dc427966b17297bf4d69e7e193135d89    bright eyes
8   00001411dc427966b17297bf4d69e7e193135d89    coaltar of the deepers
9   00001411dc427966b17297bf4d69e7e193135d89    polar bear club
10  00001411dc427966b17297bf4d69e7e193135d89    the libertines
11  00001411dc427966b17297bf4d69e7e193135d89    death from above 1979
12  00001411dc427966b17297bf4d69e7e193135d89    owl city
13  00001411dc427966b17297bf4d69e7e193135d89    coldplay
14  00001411dc427966b17297bf4d69e7e193135d89    okkervil river
15  00001411dc427966b17297bf4d69e7e193135d89    jim sturgess
16  00001411dc427966b17297bf4d69e7e193135d89    deerhoof
17  00001411dc427966b17297bf4d69e7e193135d89    fear before the march of flames
18  00001411dc427966b17297bf4d69e7e193135d89    breathe carolina
19  00001411dc427966b17297bf4d69e7e193135d89    mstrkrft

I can't tell what your expected output is. Could you clarify? — Aran-Fey, Jan 29 '18 at 02:02
Possible duplicate: https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby — jpp, Jan 29 '18 at 02:03

score 2 · Accepted Answer · answered Jan 29 '18 at 02:00

I believe you're looking for groupby + agg here.

df.groupby('users').artist.apply(list).to_dict()

{'00001411dc427966b17297bf4d69e7e193135d89': ['the most serene republic',
  'stars',
  'broken social scene',
  'have heart',
  'luminous orange',
  'boris',
  ...
]
}

pandas dataframe - group artists per unique user

1 Answers1