1

i have a dataframe column codes as below

codes
-----
[K70, X090a2, T8a981,X090a2]
[A70, X90a2, T8a91,A70,A70]
[B70, X09a2, T8a81]
[C70, X00a2, T8981,X00a2,C70]

i want output like this in a dataframe. need to check any duplicates and return only unique values and then need to unlist.

dict.fromkeys(z1['codes']) used this bcos keys doesn't have duplicates

and tried with for loop by count didn't get the expected results
output column:
codes
-----
K70 X090a2 T8a981
A70 X90a2 T8a91
B70 X09a2 T8a81
C70 X00a2 T8981
S S
  • 205
  • 2
  • 12

2 Answers2

2

If in column are lists deduplicated with dict.fromkeys and then join by whitespace:

#if values are strings
#z1['codes'] = z1['codes'].str.strip('[]').str.split(',\s*')

z1['codes'] = z1['codes'].apply(lambda x: ' '.join(dict.fromkeys(x).keys()))
print (z1)
               codes
0  K70 X090a2 T8a981
1    A70 X90a2 T8a91
2    B70 X09a2 T8a81
3    C70 X00a2 T8981
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Happy to have learnt about `dict.fromkeys(x).keys()` which looks like a great order-preserving duplicate removal trick! See https://stackoverflow.com/a/37163210/6159698 – arnaud Dec 23 '20 at 13:47
0

Set will remove duplicates from a list and join will unlist the list into a string with a whitespace.

z1['codes'].apply(lambda code: " ".join(set(code)))
print (z1)
               codes
0  K70 X090a2 T8a981
1    A70 X90a2 T8a91
2    B70 X09a2 T8a81
3    C70 X00a2 T8981
Snehal Nair
  • 181
  • 1
  • 6