0

I have a pandas dataframe in which one column (titled 'labels') contains a string of comma-separated values within a list:

name title labels
John III   ['ABC, DEF, GHI']

I would like to create new rows that separates the list at each comma, and duplicate the data for all the other columns (name, title, etc):

name title labels 
John III   ['ABC']
John III   ['DEF']
John III   ['GHI']

This answer using explode should work, but pandas is interpreting my column as an object. I tried to convert the column to a string: df['labels'].astype(str) but that didn't work, and this df['labels'].apply(', '.join) separates at the letter level rather than the whole word.

Any thoughts?

1 Answers1

0

You can do this by using strip(), split() and apply() method :

df['labels']=df['labels'].str.strip('[]').str.split(',').apply(list)

Finally use explode() method:

df=df.explode(column='labels',ignore_index=True)

Now if you print df you will get your expected output:

    name    title   labels
0   John    III     ABC
1   John    III     DEF
2   John    III     GHI

And if you want exact same output then make use of apply() method:

df['labels']=df['labels'].apply(lambda x:'['+str(x)+']')

Now if you print df you will get:

    name    title   labels
0   John    III     [ABC]
1   John    III     [DEF]
2   John    III     [GHI]
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41