3

How can I remove all repeated characters from a string?

e.g:

Input:  string = 'Hello'
Output: 'Heo'

different question from Removing duplicate characters from a string as i don't want to print out the duplicates but i want to delete them.

EddyIT
  • 133
  • 6
  • 2
    Is your question restricted to consecutive characters? Or do you want a word like `sports` to become `port`? – normanius Nov 11 '19 at 11:46
  • acutally not: i'm trying to completely remove all the duplicates and leave only the characters repeated for just 1 time...in the post above the output is a set :\ – EddyIT Nov 11 '19 at 16:17
  • If your question is perceived wrongly, you may want to rewrite it. Possibly add a few more examples to make your case clearer. What about the answers below? In my view, they achieve what you are asking for. – normanius Nov 13 '19 at 09:03
  • indeed they do.. – EddyIT Nov 13 '19 at 11:20

5 Answers5

7

You can use a generator expression and join like,

>>> x = 'Hello'
>>> ''.join(c for c in x if x.count(c) == 1)
'Heo'
han solo
  • 6,390
  • 1
  • 15
  • 19
5

You could construct a Counter from the string, and retrieve elements from it looking up in the counter which appear only once:

from collections import Counter

c = Counter(string)
''.join([i for i in string if c[i]==1])
# 'Heo'
yatu
  • 86,083
  • 12
  • 84
  • 139
  • No. It is faster precisely because `join` implicitly will create one anyways, that is why feeding it a list is a better idea @han Note that `join` internally has to iterate over the string first to calculate its size, so you wouldn't be able to perform the operation anyways if it did not fit in memory – yatu Nov 11 '19 at 09:54
  • 1
    Okay. Good to know. Thanks :) – han solo Nov 11 '19 at 09:55
  • No worries. I'll leave the complexity warning anyways, for future visitors at least, it is important to take into account for when performance *is* an issue @han – yatu Nov 11 '19 at 10:03
  • `join` won't create a `list` internally now that i a ask around. See [src](https://github.com/python/cpython/blob/master/Objects/unicodeobject.c#L10017) But still, this is very micro-optimization i am told. – han solo Nov 11 '19 at 10:05
  • Hmm from what I have understood it does, could be wrong though. Note that you brought this up though :) @han My main objection was about the difference in complexity – yatu Nov 11 '19 at 10:12
  • Okay. Just i have seen the `list` being created for join in some other places too. Sorry about that :) – han solo Nov 11 '19 at 10:13
  • Isn't it better to change the loop to -- for i in c ? Because then you loop through each letter exactly once? A small optimization? :) – Arun Nov 11 '19 at 13:20
1
a = 'Hello'


list_a = list(a)

output = []
for i in list_a:
    if list_a.count(i) == 1:
        output.append(i)

''.join(output)
1

In addition to the other answers, a filter is also possible:

s = 'Hello'
result = ''.join(filter(lambda c: s.count(c) == 1, s))
# result - Heo
Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
1

If you limit your question to cases with only repeated consecutive letters (as your example suggests), you could employ regular expressions:

import re
print(re.sub(r"(.)\1+", "", "hello"))     # result = heo
print(re.sub(r"(.)\1+", "", "helloo"))    # result = he
print(re.sub(r"(.)\1+", "", "hellooo"))   # result = he
print(re.sub(r"(.)\1+", "", "sports"))    # result = sports

If you need to re-apply the regular expression many times, its worth to compile it beforehand:

prog = re.compile(r"(.)\1+")
print(prog.sub("", "hello"))

To restrict the search for duplicated letters on some subset of characters, you can adjust the regular expression accordingly.

print(re.sub(r"(\S)\1+", "", "hello"))     # Search duplicated non-whitespace chars
print(re.sub(r"([a-z])\1+", "", "hello"))  # Search for duplicated lowercase letters

Alternatively, an approach using list comprehension could look as follows:

from itertools import groupby
dedup = lambda s: "".join([i for i, g in groupby(s) if len(list(g))==1])
print(dedup("hello"))     # result = heo
print(dedup("helloo"))    # result = he
print(dedup("hellooo"))   # result = he
print(dedup("sports"))    # result = sports

Note that the first method using regular expressions was on my machine about 8-10 times faster than the second one. (System: python 3.6.7, MacBook Pro (Mid 2015))

normanius
  • 8,629
  • 7
  • 53
  • 83