0

I am doing a project where I want to analyze some music data from Spotify. I have run into an issue that I can seem to find an answer to - thankful for help!

When I run the script retrieving the data, it is iterating over each artist and appending each track to a DataFrame. It is filtering for duplicate values, but since the same song can be released by more than one artists, it doesn't skip those entries. So in the end I have a couple of thousand entries that look like this:

artist id
Jet 34Vqb2m74NU6Pb682ymHic
Wings 34Vqb2m74NU6Pb682ymHic
Mac Miller 34Vqb2m74NU6Pb682ymHic

How do I best go about in order to get them into a single row, with all the artists listed as one - like this:

artist id
Jet, Wings, Mac Miller 34Vqb2m74NU6Pb682ymHic

I have the dataset stored in a Pandas DataFrame.

Thanks in advance!

1 Answers1

1

Given dataframe:

In [1625]: df
Out[1625]: 
       artist                      id
0         Jet  34Vqb2m74NU6Pb682ymHic
1       Wings  34Vqb2m74NU6Pb682ymHic
2  Mac Miller  34Vqb2m74NU6Pb682ymHic

Use Groupby.agg:

In [1629]: df.groupby('id', as_index=False).agg(', '.join)
Out[1629]: 
                       id                  artist
0  34Vqb2m74NU6Pb682ymHic  Jet, Wings, Mac Miller
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58