0

If I have a string in a column that is separated by commas:

"apple, apple, banana, pear, apple"

How would I remove only consecutive duplicated words? My desired output would look like this:

"apple, banana, pear, apple"

The words in the string are separated by commas. I understand in [this post] How do I use itertools.groupby()? if the words are in a list this will work, but my issue is slightly different in that it is a single string and there are commas after each word. Thank you in advance.

  • is it a string in a list or a list of strings or a single string? –  Feb 23 '22 at 00:08
  • Its a string in a column in a data frame – missatomicbomb Feb 23 '22 at 00:09
  • 1
    You could make use of `itertools.groupby()` – Barmar Feb 23 '22 at 00:10
  • 1
    @Barmar How would I go about this if the string contains commas? 'apple, apple, pear, apple' is one string and there are commas within the string? – missatomicbomb Feb 23 '22 at 00:39
  • 1
    Regarding the edit, you just need to convert the string to a list and back. Do you know how to use `str.split()` and `str.join()`? If not, you really should learn. In short it'd be something like `sep = ', '; sep.join(remove_consecutive_duplicates(string.split(sep)))`. Or do you want a Pandas-specific solution? (I'm not sure if one exists, but I do know Pandas has vectorized string operations.) – wjandrea Feb 23 '22 at 00:41
  • 1
    @wjandrea Thank you. After playing around with it I finally got what I needed. – missatomicbomb Feb 23 '22 at 02:43

0 Answers0