How to remove duplicate chars in a string?

Question

I've got this problem and I simply can't get it right. I have to remove duplicated chars from a string.

phrase = "oo rarato roeroeu aa rouroupa dodo rerei dde romroma"

The output should be: "O rato roeu a roupa do rei de roma"

I tried things like:

def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates(entrada))

But it's not there yet...

Any pointers to guide me here?

Does this answer your question? [Removing duplicate characters from a string](https://stackoverflow.com/questions/9841303/removing-duplicate-characters-from-a-string) — aliyousefian, Dec 15 '21 at 13:44
Your example output does not have duplicate chars removed, it seems to have repeating substrings removed. Is this what you want? — vaizki, Dec 15 '21 at 13:46
This sounds like you should ask a new question and clarify exactly what the criteria are. — Daniel Walker, Dec 15 '21 at 16:24

vaizki · Accepted Answer · 2021-12-15T14:15:17.817

5

It seems from your example that you want to remove REPEATED SEQUENCES of characters, not duplicate chars across the whole string. So this is what I'm solving here.

You can use a regular expression.. not sure how horribly inefficient it is but it works.

>>> import re
>>> phrase = str("oo rarato roeroeu aa rouroupa dodo rerei dde romroma")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'o rato roeu a roupa do rei de roma'

How this substitution proceeds down the string:

oo -> o
" " -> " "
rara -> ra
to -> to
" "-> " "
roeroe -> roe

etc..

Edit: Works for the other example string which should not be modified:

>>> phrase = str("Barbara Bebe com Bernardo")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'Barbara Bebe com Bernardo'

edited Dec 15 '21 at 14:15

answered Dec 15 '21 at 13:52

vaizki

1,678
1
9
12

Neat! How does the regex work? – Daniel Walker Dec 15 '21 at 13:56
1

Well it matches a repeating (+ means 1 or more) sequence of any pattern and captures it into \1, replacing the whole sequence with just one \1. Magic. I will edit into the answer how the substition proceeds. – vaizki Dec 15 '21 at 14:00
Update: I'm too confused now about what is really the requirement, I'll let the dust settle on this question for a while :) – vaizki Dec 15 '21 at 14:28
Thank you for your help. But "Barbara bebe com Bernardo" still outputs to "Barbara be com bernardo" It seems that "Bebe" and "bebe" brings differents results – Paulo Feresin Dec 15 '21 at 15:15
1

Seems like your desired solution needs contextual knowledge of the portuguese language such as a natural language parser.. The original question was about duplicate characters or repeated sequences, this new requirement is just too far off the original to be handled here. – vaizki Dec 16 '21 at 12:34
This leaves the words `'raxra'` and `'xrayra'` unchanged. I understand they should become `'rax'` and `xray'`. – Cary Swoveland Apr 02 '23 at 00:21

Daniel Walker · Answer 2 · 2021-12-15T13:52:13.823

2

What you can do is form a set out of the string and then sort the remaining letters according to their original order.

def remove_duplicates(word):
    unique_letters = set(word)
    sorted_letters = sorted(unique_letters, key=word.index) # this will give you a list
    return ''.join(sorted_letters)

words = phrase.split(' ')
new_phrase = ' '.join(remove_duplicates(word) for word in words)

edited Dec 15 '21 at 13:52

answered Dec 15 '21 at 13:46

Daniel Walker

6,380
5
22
45

Yes, this solves it. Thank you. Follow up problem: The phrase "Barbara bebe com Bernardo" should not be corrected, as it doesn't have duplicated chars. See what I mean? – Paulo Feresin Dec 15 '21 at 13:55
1

Glad to help. If you have a second question, you should post it separately. Also, don't forget to click the check-mark next to your preferred answer. – Daniel Walker Dec 15 '21 at 13:56
2

@PauloFeresin your question might not have been clear.. my answer works for the other string also (doesn't change it) ;) – vaizki Dec 15 '21 at 14:13

user2678074 · Answer 3 · 2021-12-15T14:24:38.607

String in python is a list of chars, right? But lists can have duplicates... sets cannot. So, if we convert list to set, then back to list, we'll get a list without duplicates ;P

I've seen a suggestion to use regex for replacing patterns. This will work, but that'll be a slow, and overcomplicated solution (human unfriendly to read also). Regex is a heavy and costly weapon.

Also, you do not remove duplicated from string provided, but from words in the string:

First, split your string into lists of words.
for each of the words, remove duplicate letters
put back words to string

`

phrase = "oo rarato roeroeu aa rouroupa dodo rerei dde romroma"    

words = phrase.split(' ')

`

words ['oo', 'rarato', 'roeroeu', 'aa', 'rouroupa', 'dodo', 'rerei', 'dde', 'romroma']

words_without_duplicates = []
    for word in words:
        word = ''.join(letter for letter in list(set(word)))
        words_without_duplicates.append(word_without_duplicates)
phrase = ' '.join(word in words_without_duplicates)

phrase 'o oatr oeur a auopr od eir ed oamr'

Of curse, that can be optimized, but you wanted to be guided, so this is better to show the idea. It will be faster than regex too.

it's a sequence, you can treat it like a list easy. for char in phrase: print(char) And you can manipulate list to remove any kind of duplicates as you wish. First, he's not strictly removing duplicates in string he provided. He needs to split string to words, then remove duplicates. That will return expected result. — user2678074, Dec 15 '21 at 13:59
What about the phrase 'Ann has an apapple and a hohoover?' Obviously, these words are supposed to have repeated characters, so strictly removing duplicates will not work. — S3DEV, Dec 15 '21 at 14:14

score 0 · Answer 4 · answered Dec 15 '21 at 14:48

0

Actually I add a space end of the space. After that this is working

code

phrase =("oo rarato roeroeu aa rouroupa dodo rerei dde romroma ")
print(phrase)
ch=""
ali=[]
for i in phrase:
    if i ==" ":
        print(ch)
        ch=""
    if i not in ch:
        ch=ch+i

Output

o
 rato
 roeu
 a
 roupa
 do
 rei
 de
 roma

answered Dec 15 '21 at 14:48

Mr.F.K

21
1
3

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 15 '21 at 17:21

How to remove duplicate chars in a string?

4 Answers4