0
ID       Info
1       '123','34','123'
2       'NA' ,'12','NA','567'

To remove the duplicates from the Info column,I am using the following code.It's giving me df['Info'] series sans duplicates

df['Info'].str.split(',').map(set).agg(','.join)

How do I get the following dataframe?I need to get back the data frame that contains 'ID' and 'Info' column(sans duplicates)

ID       Info
1       '123','34'
2       'NA' ,'12','567'
RSK
  • 1
  • 3
  • Use `df['Info'].str.split(',').map(set).agg(','.join).reset_index(name='Info')` – jezrael Oct 09 '18 at 12:28
  • I am not able to get the 'ID' column that way.I need to get the original data frame that contains 'ID' and 'Info' column(sans duplicates) – RSK Oct 09 '18 at 12:59
  • You are right, need `df = df.set_index('ID')['Info'].str.split(',').map(set).agg(','.join).reset_index(name='Info')` – jezrael Oct 09 '18 at 13:01
  • 1
    It worked.Thanks for your help – RSK Oct 09 '18 at 13:12

0 Answers0