removing duplicates from the column of a dataframe pandas

Asked Oct 09 '18 at 12:27

Active Oct 09 '18 at 13:03

Viewed 41 times

ID       Info
1       '123','34','123'
2       'NA' ,'12','NA','567'

To remove the duplicates from the Info column,I am using the following code.It's giving me df['Info'] series sans duplicates

df['Info'].str.split(',').map(set).agg(','.join)

How do I get the following dataframe?I need to get back the data frame that contains 'ID' and 'Info' column(sans duplicates)

ID       Info
1       '123','34'
2       'NA' ,'12','567'

edited Oct 09 '18 at 13:03

asked Oct 09 '18 at 12:27

RSK

Use `df['Info'].str.split(',').map(set).agg(','.join).reset_index(name='Info')` – jezrael Oct 09 '18 at 12:28
I am not able to get the 'ID' column that way.I need to get the original data frame that contains 'ID' and 'Info' column(sans duplicates) – RSK Oct 09 '18 at 12:59
You are right, need `df = df.set_index('ID')['Info'].str.split(',').map(set).agg(','.join).reset_index(name='Info')` – jezrael Oct 09 '18 at 13:01
1

It worked.Thanks for your help – RSK Oct 09 '18 at 13:12

0 Answers0