I asked a similar question a while ago and the solution works until there's a case where the string has more than two words - "dragon fruit". Also there're some duplicates cannot be removed completely. How can I overcome this edge case?
data = {'fruit1':["organge, apple", "apple", "organge, dragon fruit", "organge, others", "others"],
'fruit2':["apple, organge", "others", "dragon fruit, organge", "watermelon", "others"]}
df = pd.DataFrame(data)
df["together"] = (df[['fruit1', 'fruit2']].replace('others', np.nan)
.apply(lambda x: ' '.join(pd.unique(x.dropna())), axis=1)
.replace('', 'others')
)
fruit1 fruit2
0 organge, apple apple, organge
1 apple others
2 organge, dragon fruit dragon fruit, organge
3 organge, others watermelon
4 others others
Expected result:
fruit1 fruit2 together
0 organge, apple apple, organge apple, organge
1 apple others apple
2 organge, dragon fruit dragon fruit, organge organge, dragon fruit
3 organge, others watermelon organge, watermelon
4 others others others