3

I have a dataframe like this:

df3 = pd.DataFrame({'ID': ['Stay home, T5006, T5006, Stay home', 'Go for walk, T5007, T5007, Go for walk'],
                    'Name': ['Stay home, Go for walk,  Stay home', 'Go outside, Go outside, Go outside']
                    })


    ID                                      Name
0   Stay home, T5006, T5006, Stay home      Stay home, Go for walk, Stay home
1   Go for walk, T5007, T5007, Go for walk  Go outside, Go outside, Go outside

I want to delete the dulicates from ID column. Expected outcome:

    ID                  Name
0   Stay home,T5006     Stay home,  Go for walk, Stay home
1   Go for walk,T5007   Go outside, Go outside,  Go outside

Any ideas?

xavi
  • 80
  • 1
  • 12

1 Answers1

2

Use dict.fromkey trick for remove duplicates of splitted values, then join by , in lambda function:

df3['ID'] = df3['ID'].apply(lambda x: ', '.join(dict.fromkeys(x.split(', '))))

Or use list comprehension:

df3['ID'] = [', '.join(dict.fromkeys(x.split(', '))) for x in df3['ID']]

print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  Go for walk, T5007  Go outside, Go outside, Go outside

Of if possible order is not important use sets:

df3['ID'] = df3['ID'].apply(lambda x: ', '.join(set(x.split(', '))))
df3['ID'] = [', '.join(set(x.split(', '))) for x in df3['ID']]
print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  T5007, Go for walk  Go outside, Go outside, Go outside
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Could you please elaborate what `dict.fromkeys` does – Karthik S Oct 04 '22 at 09:31
  • 1
    @KarthikS - Sure, check [this](https://stackoverflow.com/a/17016257/2901002) – jezrael Oct 04 '22 at 09:34
  • 1
    Thanks, so basically it converts a list or set into unique key value pairs, if value is not given `None` is taken as default. didn't know `''.join` joins dictionary keys. Thanks! – Karthik S Oct 04 '22 at 09:40