0

in this dataframe, some values in the 'artists' column contain two artists indicated with a "," separating their names. I am trying to remove these rows and replace with a new row for each of the artists names that are separated by the comma.

Basically I am trying to find rows which meet a criteria:

featured_artists_index = raw_data_df['artists'].str.contains(',').tolist()

and make a new row for each individual artist that is separated by the comma:

new_rows = []
for idx,row in raw_data_df.loc[featured_artists_index].iterrows():
    row = row.copy()
    for artist in row['artists'].split(','):
        row['artists'] = artist
        new_rows.append(row)

then remove the original rows and append the modified rows:

raw_data_df.drop(raw_data_df.index[featured_artists_index], inplace=True)
raw_data_df.append(new_rows)

But this solution is pretty slow and am wondering if there are pandas functions that might make this more efficient and are better fitting for this task.

Thanks!

  • 4
    Does this answer your question? [Pandas column of lists, create a row for each list element](https://stackoverflow.com/questions/27263805/pandas-column-of-lists-create-a-row-for-each-list-element) – Derek O Oct 12 '20 at 03:38
  • 1
    Yep that solution works, but so does the pandas explode function (found in that thread) which is a bit more consolidated: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html#pandas.DataFrame.explode – confusedcoder Oct 12 '20 at 05:24

0 Answers0