4

I am trying to remove word with single repeated characters using regex in python, for example :

good => good
gggggggg => g

What I have tried so far is following

re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')

Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.

Hrithik Puri
  • 286
  • 1
  • 3
  • 20

3 Answers3

4

A better approach here is to use a set

def modify(s):

    #Create a set from the string
    c = set(s)

    #If you have only one character in the set, convert set to string
    if len(c) == 1:
        return ''.join(c)
    #Else return original string
    else:
        return s

print(modify('good'))
print(modify('gggggggg'))

If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from @bobblebubble comment)

import re

def modify(s):

    #Create the sub string with a regex which only matches if a single character is repeated
    #Marking the start and end of string as well
    out = re.sub(r'^([a-z])\1+$', r'\1', s)
    return out

print(modify('good'))
print(modify('gggggggg'))

The output will be

good
g
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
2

You can use trim command:

take a look at this examples:

"ggggggg".Trim('g');

Update: and for characters which are in the middle of the string use this function, thanks to this answer

in java:

public static string RemoveDuplicates(string input)
{
    return new string(input.ToCharArray().Distinct().ToArray());
}

in python:

used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]

but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:

In:

def unique(s):
    used = set()
    ret = list()
    s = list(s)
    for x in s:
        if x not in used:
            ret.append(x)
            used = set()

        used.add(x)

    return ret

print(unique('aaaaabbbbbcda'))

out:

['a', 'b', 'c', 'd', 'a']
Chapyar
  • 21
  • 5
2

If you do not want to use a set in your method, this should do the trick:

def simplify(s):
  l = len(s)
  if l>1 and s.count(s[0]) == l:
    return s[0]
  return s

print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))

output:

good
abba
g
g

Explanations:

  • You compute the length of the string
  • you count the number of characters that are equal to the first one and you compare the count with the initial string length
  • depending on the result you return the first character or the whole string
Allan
  • 12,117
  • 3
  • 27
  • 51