How to remove duplicated characters from a string?

Question

How can I remove all repeated characters from a string?

e.g:

Input:  string = 'Hello'
Output: 'Heo'

different question from Removing duplicate characters from a string as i don't want to print out the duplicates but i want to delete them.

Is your question restricted to consecutive characters? Or do you want a word like `sports` to become `port`? — normanius, Nov 11 '19 at 11:46
acutally not: i'm trying to completely remove all the duplicates and leave only the characters repeated for just 1 time...in the post above the output is a set :\ — EddyIT, Nov 11 '19 at 16:17
If your question is perceived wrongly, you may want to rewrite it. Possibly add a few more examples to make your case clearer. What about the answers below? In my view, they achieve what you are asking for. — normanius, Nov 13 '19 at 09:03

score 7 · Accepted Answer · answered Nov 11 '19 at 09:35

7

You can use a generator expression and join like,

>>> x = 'Hello'
>>> ''.join(c for c in x if x.count(c) == 1)
'Heo'

answered Nov 11 '19 at 09:35

han solo

6,390
1
15
19

2

This is unnecessarily `O(n**2)` – yatu Nov 11 '19 at 09:37
1

Sure. But the user didn't raise any concern regarding complexity – han solo Nov 11 '19 at 09:38
1

So? I can't see why that is a good reason to not go with the most efficient answer – yatu Nov 11 '19 at 09:38
Premature optimization is the root of all evil ? :) I will do it with a `dict`, if i am worried about time complexity though – han solo Nov 11 '19 at 09:44
This complexity is not `O(n**2)`, it is at worst `O(nm)` :) – han solo Nov 11 '19 at 10:23

score 5 · Answer 2 · answered Nov 11 '19 at 09:35

5

You could construct a Counter from the string, and retrieve elements from it looking up in the counter which appear only once:

from collections import Counter

c = Counter(string)
''.join([i for i in string if c[i]==1])
# 'Heo'

answered Nov 11 '19 at 09:35

yatu

86,083
12
84
139

No. It is faster precisely because `join` implicitly will create one anyways, that is why feeding it a list is a better idea @han Note that `join` internally has to iterate over the string first to calculate its size, so you wouldn't be able to perform the operation anyways if it did not fit in memory – yatu Nov 11 '19 at 09:54
1

Okay. Good to know. Thanks :) – han solo Nov 11 '19 at 09:55
No worries. I'll leave the complexity warning anyways, for future visitors at least, it is important to take into account for when performance *is* an issue @han – yatu Nov 11 '19 at 10:03
`join` won't create a `list` internally now that i a ask around. See [src](https://github.com/python/cpython/blob/master/Objects/unicodeobject.c#L10017) But still, this is very micro-optimization i am told. – han solo Nov 11 '19 at 10:05
Hmm from what I have understood it does, could be wrong though. Note that you brought this up though :) @han My main objection was about the difference in complexity – yatu Nov 11 '19 at 10:12
Okay. Just i have seen the `list` being created for join in some other places too. Sorry about that :) – han solo Nov 11 '19 at 10:13
Isn't it better to change the loop to -- for i in c ? Because then you loop through each letter exactly once? A small optimization? :) – Arun Nov 11 '19 at 13:20

score 1 · Answer 3 · answered Nov 11 '19 at 09:37

1

a = 'Hello'


list_a = list(a)

output = []
for i in list_a:
    if list_a.count(i) == 1:
        output.append(i)

''.join(output)

answered Nov 11 '19 at 09:37

Thotsaphon Sirikutta

128
7

score 1 · Answer 4 · answered Nov 11 '19 at 09:40

1

In addition to the other answers, a filter is also possible:

s = 'Hello'
result = ''.join(filter(lambda c: s.count(c) == 1, s))
# result - Heo

answered Nov 11 '19 at 09:40

Ofer Sadan

11,391
5
38
62

normanius · Answer 5 · 2019-11-13T09:07:00.923

If you limit your question to cases with only repeated consecutive letters (as your example suggests), you could employ regular expressions:

import re
print(re.sub(r"(.)\1+", "", "hello"))     # result = heo
print(re.sub(r"(.)\1+", "", "helloo"))    # result = he
print(re.sub(r"(.)\1+", "", "hellooo"))   # result = he
print(re.sub(r"(.)\1+", "", "sports"))    # result = sports

If you need to re-apply the regular expression many times, its worth to compile it beforehand:

prog = re.compile(r"(.)\1+")
print(prog.sub("", "hello"))

To restrict the search for duplicated letters on some subset of characters, you can adjust the regular expression accordingly.

print(re.sub(r"(\S)\1+", "", "hello"))     # Search duplicated non-whitespace chars
print(re.sub(r"([a-z])\1+", "", "hello"))  # Search for duplicated lowercase letters

Alternatively, an approach using list comprehension could look as follows:

from itertools import groupby
dedup = lambda s: "".join([i for i, g in groupby(s) if len(list(g))==1])
print(dedup("hello"))     # result = heo
print(dedup("helloo"))    # result = he
print(dedup("hellooo"))   # result = he
print(dedup("sports"))    # result = sports

Note that the first method using regular expressions was on my machine about 8-10 times faster than the second one. (System: python 3.6.7, MacBook Pro (Mid 2015))

How to remove duplicated characters from a string?

5 Answers5