0

I have string :-

s = 'bubble'

how to use regular expression to get a list like:

['b', 'u', 'bb', 'l', 'e']

I want to filter single as well as double occurrence of a letter.

anon582847382
  • 19,907
  • 5
  • 54
  • 57
NBA
  • 37
  • 1
  • 8

2 Answers2

4

This should do it:

import re

[m.group(0) for m in re.finditer('(.)\\1*',s)]

For 'bubbles' this returns:

['b', 'u', 'bb', 'l', 'e', 's']

For 'bubblesssss' this returns:

['b', 'u', 'bb', 'l', 'e', 'sssss']
matterhayes
  • 458
  • 2
  • 6
  • if you only want to find double occurrence, you can use ? instead of * in the regex. You can also use a raw string instead of having to escape the backslash. – Thayne Feb 23 '14 at 18:56
0

You really have two questions. The first question is how to split the list, the second is how to filter.

The splitting takes advantage of back references in a pattern. In this case we'll construct a pattern the will find one or two occurrences of a letter then construct a list from the search results. The \1 in the code block refers to the first parenthesized expression.

import re
pattern = re.compile(r'(.)\1?')
s = "bubble"
result = [x.group() for x in pattern.finditer(s)]
print(result)

To filter the list stored in result you could use a list comprehension that filters on length.

filtered_result = [x for x in result if len(x) == 2]
print(filtered_result)

You could just get the set of duplications directly by tweaking the regular expression.

pattern2 = re.compile(r'(.)\1')
result2 = [x.group() for x in pattern2.finditer(s)]
print(result2)

The output from running the above is:

['b', 'u', 'bb', 'l', 'e']
['bb']
['bb']
John Percival Hackworth
  • 11,395
  • 2
  • 29
  • 38