How to create a dictionary from a Pandas DF where there are duplicate names in a series

Question

I have a Pandas DF which holds names of music albums and various information. There are multiple records for the same music artists. I would like to produce a dictionary from this where the key = the artist name and the value = a list of the albums for the artist:

The example pandas df looks like this:


          artist                                      album
0           A-ha  Headlines And Deadlines: The Hits Of A-Ha
1           Abba                       Greatest Hits Vol. 2
2            abc                        The Lexicon Of Love
3          AC/DC                              Back In Black
4          AC/DC                            Highway to Hell
5  All About Eve                              All About Eve
6   Jon Anderson                         Olias of Sunhillow
7   Jon Anderson                              Song of Seven

The output I want is:

output = {
'A-ha': ['Headlines And Deadlines: The Hits Of A-Ha'], 
'Abba': ['Greatest Hits Vol. 2'], 
'abc': ['The Lexicon Of Love'], 
'AC/DC': [['Back In Black'],['Highway to Hell']],
'All About Eve': ['All About Eve'], 
'Jon Anderson': [['Olias of Sunhillow'],['Song of Seven']]
}

I have tries looping through the dataframe and also df.to.dict options but I haven't been able to produce my required output.

I get this warning from pandas: UserWarning: DataFrame columns are not unique, some columns will be omitted.

Thanks

You can check the answers in this question: https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby `groupby`& `apply` or `agg` — Sergio Gracia, May 29 '21 at 08:07

score 2 · Accepted Answer · answered May 29 '21 at 09:34

You can groupby, apply list and convert to_dict:

df.groupby('artist')['album'].apply(list).to_dict()

Output:

{'A-ha': ['Headlines And Deadlines: The Hits Of A-Ha'],
 'AC/DC': ['Back In Black', 'Highway to Hell'],
 'Abba': ['Greatest Hits Vol. 2'],
 'All About Eve': ['All About Eve'],
 'Jon Anderson': ['Olias of Sunhillow', 'Song of Seven'],
 'abc': ['The Lexicon Of Love']}

P.S. Do you really want a list of lists, like in [['Back In Black'],['Highway to Hell']] or a list of strings like in my output above: ['Back In Black','Highway to Hell']?

Ade_1 · Answer 2 · 2021-05-29T08:41:19.400

0

You can try this

grouped = df.groupby('artist').agg({'album': lambda x: x.to_list()}).reset_index()

grouped.to_dict('records')

edited May 29 '21 at 08:41

answered May 29 '21 at 08:17

Ade_1

1,480
1
6
17

I get an error on: grouped = df.groupby('artist').agg({'album': lambda x. x.to_list()}).reset_index(). pycharm doesn't like the "x. x" section. – milesabc123 May 29 '21 at 08:40
apologies had a typo, omitted the colon in the lambda function – Ade_1 May 29 '21 at 08:41
1

btw there is no need ofwriting lambda functiion you can simply also do `grouped=df.groupby('artist').agg(list).reset_index()` – Anurag Dabas May 29 '21 at 10:21

Georgy Kopshteyn · Answer 3 · 2021-05-29T08:38:57.600

0

The following should produce the desired result:

output = {}
for i, artist in enumerate(df["artist"]):
    if artist in output:
        output[artist].append([df.at[i, "album"]])
    else:
        output[artist] = [[df.at[i, "album"]]]

edited May 29 '21 at 08:38

answered May 29 '21 at 08:26

Georgy Kopshteyn

678
3
13

This is great. One question, if I wanted to add in a second element into the dictionary value (for example a genre) how would I do this. for example: output = {"abba" : [["best of abba"], ["pop"]]...}. Thanks – milesabc123 Jun 01 '21 at 09:05
Glad it helps. As to your question, you have different options depending on your desired outcome structure. You could, to stay with your example, do `output["abba"][0] = [output["abba"][0], "pop"]` to add the genre "pop" to the the first abba album. The `outcome` dict would then contain a list with a value pair of album and genre at this position for the "abba" key. Note that in your case it might make sense to use tuples for the album-genre pairs, since tuples, in contrast to lists, are immutable. – Georgy Kopshteyn Jun 01 '21 at 10:38

How to create a dictionary from a Pandas DF where there are duplicate names in a series

3 Answers3