How can we remove word with repeated single character?

Question

I am trying to remove word with single repeated characters using regex in python, for example :

good => good
gggggggg => g

What I have tried so far is following

re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')

Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.

What do you mean by `single repeated character` ? `o` is repeated in `good` hence it is replaced — Devesh Kumar Singh, Jun 05 '19 at 06:53
See [Python: Best Way to remove duplicate character from string](https://stackoverflow.com/questions/18799036/python-best-way-to-remove-duplicate-character-from-string). And as for `good`, regex is not aware of natural language spelling, you should add exceptions yourself. — Wiktor Stribiżew, Jun 05 '19 at 06:54
I don't want good to be replaced because it is a actual dictionary word — Hrithik Puri, Jun 05 '19 at 06:55
You can't use regex to determine if a word is in the dictionary. This is impossible. — T Tse, Jun 05 '19 at 06:55
What I mean is that I want to remove word with just single repeated character. like `mmmmmm => m` `aaaaaaa => a` — Hrithik Puri, Jun 05 '19 at 07:00
so words with more than one character, irrespective of them being repeated or not will be untouched? — Devesh Kumar Singh, Jun 05 '19 at 07:02
Got it, you can use a set here which will be much simpler, check my answer below @HrithikPuri — Devesh Kumar Singh, Jun 05 '19 at 07:14
Sure, I have added a regex solution as well @HrithikPuri but I would say do something simpler using a set instead of a regex — Devesh Kumar Singh, Jun 05 '19 at 07:38
Words consisting of single letter only [like this](https://regex101.com/r/LcqQOJ/2)? Just add [word boundaries](https://www.regular-expressions.info/wordboundaries.html) to your pattern. — bobble bubble, Jun 05 '19 at 07:50
Good point @bobblebubble Do you mind if I put that in my answer referring your comment? — Devesh Kumar Singh, Jun 05 '19 at 07:56
Yes, a word boundary or a start and end of string marker should do the job — Devesh Kumar Singh, Jun 05 '19 at 07:59

Devesh Kumar Singh · Accepted Answer · 2019-06-05T07:58:20.963

A better approach here is to use a set

def modify(s):

    #Create a set from the string
    c = set(s)

    #If you have only one character in the set, convert set to string
    if len(c) == 1:
        return ''.join(c)
    #Else return original string
    else:
        return s

print(modify('good'))
print(modify('gggggggg'))

If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from @bobblebubble comment)

import re

def modify(s):

    #Create the sub string with a regex which only matches if a single character is repeated
    #Marking the start and end of string as well
    out = re.sub(r'^([a-z])\1+$', r'\1', s)
    return out

print(modify('good'))
print(modify('gggggggg'))

The output will be

good
g

Chapyar · Answer 2 · 2019-06-10T04:39:08.380

2

You can use trim command:

take a look at this examples:

"ggggggg".Trim('g');

Update: and for characters which are in the middle of the string use this function, thanks to this answer

in java:

public static string RemoveDuplicates(string input)
{
    return new string(input.ToCharArray().Distinct().ToArray());
}

in python:

used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]

but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:

In:

def unique(s):
    used = set()
    ret = list()
    s = list(s)
    for x in s:
        if x not in used:
            ret.append(x)
            used = set()

        used.add(x)

    return ret

print(unique('aaaaabbbbbcda'))

out:

['a', 'b', 'c', 'd', 'a']

edited Jun 10 '19 at 04:39

answered Jun 05 '19 at 06:54

Chapyar

21
5

Hey, thanks for the contribution but I don't want to trim just 'g'. – Hrithik Puri Jun 05 '19 at 06:56
This is Java ? But the question is tagged python – Devesh Kumar Singh Jun 05 '19 at 07:18
I add the python code and also correct some incorrect answers. – Chapyar Jun 10 '19 at 04:40

score 2 · Answer 3 · answered Jun 05 '19 at 08:40

If you do not want to use a set in your method, this should do the trick:

def simplify(s):
  l = len(s)
  if l>1 and s.count(s[0]) == l:
    return s[0]
  return s

print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))

output:

good
abba
g
g

Explanations:

You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string

How can we remove word with repeated single character?

3 Answers3