1

I have a Pandas DF which holds names of music albums and various information. There are multiple records for the same music artists. I would like to produce a dictionary from this where the key = the artist name and the value = a list of the albums for the artist:

The example pandas df looks like this:


          artist                                      album
0           A-ha  Headlines And Deadlines: The Hits Of A-Ha
1           Abba                       Greatest Hits Vol. 2
2            abc                        The Lexicon Of Love
3          AC/DC                              Back In Black
4          AC/DC                            Highway to Hell
5  All About Eve                              All About Eve
6   Jon Anderson                         Olias of Sunhillow
7   Jon Anderson                              Song of Seven

The output I want is:

output = {
'A-ha': ['Headlines And Deadlines: The Hits Of A-Ha'], 
'Abba': ['Greatest Hits Vol. 2'], 
'abc': ['The Lexicon Of Love'], 
'AC/DC': [['Back In Black'],['Highway to Hell']],
'All About Eve': ['All About Eve'], 
'Jon Anderson': [['Olias of Sunhillow'],['Song of Seven']]
}

I have tries looping through the dataframe and also df.to.dict options but I haven't been able to produce my required output.

I get this warning from pandas: UserWarning: DataFrame columns are not unique, some columns will be omitted.

Thanks

  • 1
    You can check the answers in this question: https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby `groupby`& `apply` or `agg` – Sergio Gracia May 29 '21 at 08:07

3 Answers3

2

You can groupby, apply list and convert to_dict:

df.groupby('artist')['album'].apply(list).to_dict()

Output:

{'A-ha': ['Headlines And Deadlines: The Hits Of A-Ha'],
 'AC/DC': ['Back In Black', 'Highway to Hell'],
 'Abba': ['Greatest Hits Vol. 2'],
 'All About Eve': ['All About Eve'],
 'Jon Anderson': ['Olias of Sunhillow', 'Song of Seven'],
 'abc': ['The Lexicon Of Love']}

P.S. Do you really want a list of lists, like in [['Back In Black'],['Highway to Hell']] or a list of strings like in my output above: ['Back In Black','Highway to Hell']?

perl
  • 9,826
  • 1
  • 10
  • 22
0

You can try this

grouped = df.groupby('artist').agg({'album': lambda x: x.to_list()}).reset_index()

grouped.to_dict('records')
Ade_1
  • 1,480
  • 1
  • 6
  • 17
  • I get an error on: grouped = df.groupby('artist').agg({'album': lambda x. x.to_list()}).reset_index(). pycharm doesn't like the "x. x" section. – milesabc123 May 29 '21 at 08:40
  • apologies had a typo, omitted the colon in the lambda function – Ade_1 May 29 '21 at 08:41
  • 1
    btw there is no need ofwriting lambda functiion you can simply also do `grouped=df.groupby('artist').agg(list).reset_index()` – Anurag Dabas May 29 '21 at 10:21
0

The following should produce the desired result:

output = {}
for i, artist in enumerate(df["artist"]):
    if artist in output:
        output[artist].append([df.at[i, "album"]])
    else:
        output[artist] = [[df.at[i, "album"]]]
Georgy Kopshteyn
  • 678
  • 3
  • 13
  • This is great. One question, if I wanted to add in a second element into the dictionary value (for example a genre) how would I do this. for example: output = {"abba" : [["best of abba"], ["pop"]]...}. Thanks – milesabc123 Jun 01 '21 at 09:05
  • Glad it helps. As to your question, you have different options depending on your desired outcome structure. You could, to stay with your example, do `output["abba"][0] = [output["abba"][0], "pop"]` to add the genre "pop" to the the first abba album. The `outcome` dict would then contain a list with a value pair of album and genre at this position for the "abba" key. Note that in your case it might make sense to use tuples for the album-genre pairs, since tuples, in contrast to lists, are immutable. – Georgy Kopshteyn Jun 01 '21 at 10:38