Python : remove duplicate character

Question

string input: Tem1 = 'Hhelloo ookkee'

I want to make output like Tem1 = 'helo oke'

I have try this link form stackoverflow (Python: Best Way to remove duplicate character from string)

I've tried using itertools, but when saving in csv. the stored format is still the same with lots of duplicate characters

import itertools
tem1 = sum(val*(2**idx) for idx, val in enumerate(reversed(tem)))
if bit[0:8]==[1,0,0,1,1,0,0,1]:
    cv2.putText(frame, "Text Print: " + chr(tem1) +".....", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    print(chr(tem1))
    cv2.imshow('frame',frame)
if str(tem1)!='0':
    row = ''.join(ch for ch, _ in itertools.groupby(f'{chr(tem1)}'))
    # create csv file to save the data.
    f.write(row)

best way to remove duplicate

NOTE: Order is important and this question is not similar to this one.

What about the result of `join`? Is that string correct? Why are you storing the result in `row` but in your csv you are writing `newrow`? — Jorge Luis, Mar 23 '23 at 07:53
I don't understand your call to `groupby`. It should receive the string whose duplicate characters have to be removed. Instead you are passing a string with a single character (which will never have a double character). — Jorge Luis, Mar 23 '23 at 07:57
@JorgeLuis sorry, i have updated "newrow" into "row". I have try this one with f.write(row),but the result is still duplicate when I save into csv file — 0nespo, Mar 24 '23 at 06:38
without a reproducible example is really hard to tell from the code you posted what you are trying to achieve because you are doing some weird stuff. — Jorge Luis, Mar 24 '23 at 07:48

Headcrab · Answer 1 · 2023-04-11T18:28:08.067

0

def remove_duplicates(s):
    acc = [s[0]]
    for c in s[1:]:
        if acc[-1] != c: acc.append(c)
    return ''.join(acc)

s = "Hhelloo ookkee"
print(remove_duplicates(s))

There's also a module called more-itertools (install with pip install more-itertools), that has a unique_justseen function which seems to do the same thing:

from more_itertools import unique_justseen

s = "Hhelloo ookkee"
print(''.join(unique_justseen(s)))

The output would be 'Hhelo oke', because 'H' and 'h' are, strictly speaking, different characters. If you want the comparison to be case-insensitive, you should lowercase the symbols before comparing. For simple examples with strings limited to Latin alphabet calling str.lower() would be enough, but it wouldn't work for some Unicode characters, therefore, for real stuff, casefold() should be used instead; read this about even more real stuff. E. g., for the first of the above code samples:

if acc[-1].casefold() != c.casefold(): acc.append(c)

And for the second, using the optional key argument:

unique_justseen(s, str.casefold)

And in both cases it would probably be more efficient to casefold the entire string first, not to do it character by character when comparing.

edited Apr 11 '23 at 18:28

answered Mar 24 '23 at 01:02

Headcrab

6,838
8
40
45

technically str.lower() and str.upper() can not give you case-insensitive comparison. These operations are asymmetric casing operations, not matching or comparison operations, neither do they remove all case distinctions. – Andj Apr 10 '23 at 06:22
@Andj Huh? "Hello, World!".lower() == "hello, world!".lower() -> True – Headcrab Apr 10 '23 at 16:31
The fact that "Hello, World!".lower() == "hello, world!".lower() is true is irrelevant. Not all lowercase characters have uppercase equivalents, not all uppercase characters have lowercase equivalents. Some uppercase characters map to two characters when lowercased. Also with casing, two types of casing are defined in Unicode: simple casing and full casing. @headcrab Unicode defines four types of caseless matching, the simplest is case-folding. str.lower() and str.upper() were only ever caseless matching in Python 2. I.e. when using encodings other than Unicode. – Andj Apr 11 '23 at 07:54
@Andj I see no point in overloading such a simple question with endless Unicode intricacies, but I've added str.casefold() and a link for further reading into the answer. OK now, or do you still see some room for exercises in pedantry? – Headcrab Apr 11 '23 at 18:34
your answer works perfectly well for the question, I didn't imply that. Rather, I was pointing out that your characterisation of the str.lower() operation was technically incorrect and a hangover from Python 2.x. For the question being answered, it doesn't matter, true. But stackoverflow questions and answers are often searched after the fact, in fact people asking questions are encouraged to search first, ask if they can't find the answer already. So it is better to be exact for future readers rather than having partial answers where the distinctions matter. – Andj Apr 12 '23 at 07:30

Python : remove duplicate character

1 Answers1