How to remove duplicates only if consecutive in a string?

Question

For a string such as '12233322155552', by removing the duplicates, I can get '1235'.

But what I want to keep is '1232152', only removing the consecutive duplicates.

Paulo Freitas · Answer 1 · 2019-11-11T16:30:18.690

21

import re

# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')

# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')

edited Nov 11 '19 at 16:30

answered Jul 16 '12 at 06:01

Paulo Freitas

13,194
14
74
96

2

Use `r'(.)\1+'` to generalize this solution for any repeated character, and `r'(\S)\1+'` to any *non-whitespace* character. – normanius Nov 11 '19 at 13:40

score 15 · Answer 2 · answered May 24 '17 at 11:57

15

You can use itertools, here is the one liner

>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'

answered May 24 '17 at 11:57

akash karothiya

5,736
1
19
29

score 10 · Answer 3 · answered Jul 12 '12 at 21:22

10

Microsoft / Amazon job interview type of question: This is the pseudocode, the actual code is left as exercise.

for each char in the string do:
   if the current char is equal to the next char:
      delete next char
   else
     continue

return string

As a more high level, try (not actually the implementation):

for s in string:
  if s == s+1:  ## check until the end of the string
     delete s+1

answered Jul 12 '12 at 21:22

cybertextron

10,547
28
104
208

6

Good call on not giving exact code (though Python is pretty darn close to pseudocode already). – John Y Jul 12 '12 at 21:28

score 7 · Answer 4 · answered Jul 12 '12 at 21:33

Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:

itertools.groupby(iterable[, key])

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

So since strings are iterable, what you could do is:

use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together

which can all be done in one clean line..

score 2 · Answer 5 · edited Mar 11 '14 at 04:00

2

First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).

M first approach would be:

foo = '12233322155552'
bar = ''
for chr in foo:
    if bar == '' or chr != bar[len(bar)-1]:
        bar += chr

or, using the itertools hint from above:

''.join([ k[0] for k in groupby(a) ])

edited Mar 11 '14 at 04:00

AndyG

39,700
8
109
143

answered Jul 12 '12 at 23:49

paul

408
2
8

score 1 · Answer 6 · answered Jul 12 '12 at 22:46

+1 for groupby. Off the cuff, something like:

from itertools import groupby
def remove_dupes(arg):
    # create generator of distinct characters, ignore grouper objects
    unique = (i[0] for i in groupby(arg))
    return ''.join(unique)

Cooks for me in Python 2.7.2

score 1 · Answer 7 · answered Feb 19 '17 at 14:05

1

number = '12233322155552'
temp_list = []


for item in number:   
   if len(temp_list) == 0:
      temp_list.append(item)

   elif len(temp_list) > 0:
      if  temp_list[-1] != item:
          temp_list.append(item)

print(''.join(temp_list))

answered Feb 19 '17 at 14:05

Fuji Komalan

1,979
16
25

score 1 · Answer 8 · edited Nov 22 '18 at 15:37

1

This would be a way:

def fix(a):
    list = []

    for element in a:
        # fill the list if the list is empty
        if len(list) == 0:list.append(element)
        # check with the last element of the list
        if list[-1] != element:  list.append(element)

    print(''.join(list))    


a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi

edited Nov 22 '18 at 15:37

Ma0

15,057
4
35
65

answered Apr 20 '18 at 16:18

Soudipta Dutta

1,353
1
12
7

score 0 · Answer 9 · edited Feb 03 '13 at 03:47

0

t = '12233322155552'
for i in t:
    dup = i+i
    t = re.sub(dup, i, t)

You can get final output as 1232152

edited Feb 03 '13 at 03:47

pradyunsg

18,287
11
43
96

answered Jul 16 '12 at 05:28

Prasanna

93
5

How to remove duplicates only if consecutive in a string?

9 Answers9

Linked

Related