How to find double occurrence of a letter in a word

Question

I have string :-

s = 'bubble'

how to use regular expression to get a list like:

['b', 'u', 'bb', 'l', 'e']

I want to filter single as well as double occurrence of a letter.

See the following SO post; http://stackoverflow.com/questions/6306098/regexp-match-repeated-characters — user1749431, Feb 23 '14 at 18:52

score 4 · Answer 1 · answered Feb 23 '14 at 18:54

4

This should do it:

import re

[m.group(0) for m in re.finditer('(.)\\1*',s)]

For 'bubbles' this returns:

['b', 'u', 'bb', 'l', 'e', 's']

For 'bubblesssss' this returns:

['b', 'u', 'bb', 'l', 'e', 'sssss']

answered Feb 23 '14 at 18:54

matterhayes

458
2
6

if you only want to find double occurrence, you can use ? instead of * in the regex. You can also use a raw string instead of having to escape the backslash. – Thayne Feb 23 '14 at 18:56

score 0 · Accepted Answer · answered Feb 23 '14 at 18:59

You really have two questions. The first question is how to split the list, the second is how to filter.

The splitting takes advantage of back references in a pattern. In this case we'll construct a pattern the will find one or two occurrences of a letter then construct a list from the search results. The \1 in the code block refers to the first parenthesized expression.

import re
pattern = re.compile(r'(.)\1?')
s = "bubble"
result = [x.group() for x in pattern.finditer(s)]
print(result)

To filter the list stored in result you could use a list comprehension that filters on length.

filtered_result = [x for x in result if len(x) == 2]
print(filtered_result)

You could just get the set of duplications directly by tweaking the regular expression.

pattern2 = re.compile(r'(.)\1')
result2 = [x.group() for x in pattern2.finditer(s)]
print(result2)

The output from running the above is:

['b', 'u', 'bb', 'l', 'e']
['bb']
['bb']

How to find double occurrence of a letter in a word

2 Answers2