Remove consecutive duplicate words from a comma separated string

Asked Feb 23 '22 at 00:06

Active Feb 23 '22 at 00:37

Viewed 206 times

If I have a string in a column that is separated by commas:

"apple, apple, banana, pear, apple"

How would I remove only consecutive duplicated words? My desired output would look like this:

"apple, banana, pear, apple"

The words in the string are separated by commas. I understand in [this post] How do I use itertools.groupby()? if the words are in a list this will work, but my issue is slightly different in that it is a single string and there are commas after each word. Thank you in advance.

edited Feb 23 '22 at 00:37

asked Feb 23 '22 at 00:06

missatomicbomb

is it a string in a list or a list of strings or a single string? – Feb 23 '22 at 00:08
Its a string in a column in a data frame – missatomicbomb Feb 23 '22 at 00:09
1

You could make use of `itertools.groupby()` – Barmar Feb 23 '22 at 00:10
1

@Barmar How would I go about this if the string contains commas? 'apple, apple, pear, apple' is one string and there are commas within the string? – missatomicbomb Feb 23 '22 at 00:39
1

Regarding the edit, you just need to convert the string to a list and back. Do you know how to use `str.split()` and `str.join()`? If not, you really should learn. In short it'd be something like `sep = ', '; sep.join(remove_consecutive_duplicates(string.split(sep)))`. Or do you want a Pandas-specific solution? (I'm not sure if one exists, but I do know Pandas has vectorized string operations.) – wjandrea Feb 23 '22 at 00:41
1

@wjandrea Thank you. After playing around with it I finally got what I needed. – missatomicbomb Feb 23 '22 at 02:43

Remove consecutive duplicate words from a comma separated string

0 Answers0