I want to achieve something like in this post: Python Dataframe: Remove duplicate words in the same cell within a column in Python, but for the entire dataframe in a efficient way.
My data looks something like this: It is a pandas data frame with a lot of columns. It has comma separated strings where there are a lot of duplicates - and I wish to remove all duplicates within those individual strings.
+--------------------+---------+---------------------+
| Col1 | Col2 | Col3 |
+--------------------+---------+---------------------+
| Dog, Dog, Dog | India | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway | Facebook, Facebook |
| Cat, Cat, Cat | Germany | Twitter |
+--------------------+---------+---------------------+
Reproducable example:
df = pd.DataFrame({"col1": ["Dog, Dog, Dog", "Dog, Squirrel, Cat", "Cat, Cat, Cat"],
"col2": ["India", "Norway", "Germany"],
"col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})
I would like it to transform it to this:
+--------------------+---------+---------------------+
| Col1 | Col2 | Col3 |
+--------------------+---------+---------------------+
| Dog | India | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway | Facebook |
| Cat | Germany | Twitter |
+--------------------+---------+---------------------+