4

How do i remove repeated letters in a string?

Tried this without success..

def shorten_string(char_str):
    new=''
    for i in range(0,len(char_str)-1):
       if char_str[i-1] != char_str[i]:
           new += char_str[i]
return new

EDIT: Misunderstanding: i do not want to delete all repeated characthers. Just if they are repeated in order.

input: lloolleellaa
outpu: lolela
Sander B
  • 337
  • 3
  • 4
  • 18

3 Answers3

8

It is the same logic as is for all languages. This is a frequent question asked in interviews. Basically you assign each character of the string to a data structure. The choice of the data structure differs from language and performance. Sometimes they might also ask if the order matters or not.

>>> foo = 'haalllooo'
>>> ''.join(sorted(set(foo), key=foo.index))
'halo'
roymustang86
  • 8,054
  • 22
  • 70
  • 101
  • See edit, I wrote the question in a bad way i guess. – Sander B Dec 05 '16 at 14:18
  • @SanderB Then the solution by Dan D. does what you need. – Keiwan Dec 05 '16 at 14:20
  • 1
    I upvoted this before I realized that it actually has a major bug that deletes **all** duplicated letters, not just the ones that are next to each other. By the time I realized the error, it was too late to take the upvote back. Beware this answer doesn't answer the question. – Barker May 05 '20 at 00:22
  • It worked for me too in the Roman Urdu case. Thanks – Bilal Chandio Oct 05 '20 at 19:48
8

Removing adjacent equal items can be done as follows with groupby:

>>> import itertools
>>> ''.join(c[0] for c in itertools.groupby('haalllooo'))
'halo'

This simply takes the heads of each of the groups of equal items.

>>> ''.join(c[0] for c in itertools.groupby('haalllooo thheeerrree tttthhhiiisss iiisss aaann eeeexxxaaammpppllleee'))
'halo there this is an example'

To keep only the unique items in order:

def unique(it):
    s = set()
    for x in it:
        if x not in s:
           s.add(x)
           yield x

This can be used like this:

>>> ''.join(unique('haalllooo'))
'halo'
>>> ''.join(unique('haalllooo thheeerrree tttthhhiiisss iiisss aaann eeeexxxaaammpppllleee'))
'halo terisnxmp'
Dan D.
  • 73,243
  • 15
  • 104
  • 123
  • You should consider adding this as an answer to the duplicate. A solution which only removes adjacent items isn't represented on that question yet. – Chris Mueller Dec 05 '16 at 14:37
2

My solution with regex:

>>> import re

>>> re.compile(r'(.)\1{1,}', re.IGNORECASE).sub(r'\1', "haalllooo thheeerrree tttthhhiiisss iiisss aaann eeeexxxaaammpppllleee")
'halo there this is an example'

But note that Dan's solution is 4x faster then regex!!

Zoe
  • 27,060
  • 21
  • 118
  • 148
Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124