regex with {} not working

Question

I am trying to match more than 2 occurrences of haha in the following code. But () seems to be working as grouping. Why isn't it working?

>>> pattern="this is a joke hahahahahaaa. I cannot stop laughing hahahaaa"
>>> print(re.findall("(ha){2,}",pattern))
['ha', 'ha']

I wanted results to be:

['hahahaha', 'hahaha']

How do I fix it?

https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do — Brad Solomon, Jan 21 '18 at 20:56

RomanPerekhrest · Accepted Answer · 2018-01-21T21:00:43.873

2

import re

s = '"this is a joke hahahahahaaa. I cannot stop laughing hahahaaa"'
result = re.findall(r'(?:ha){2,}', s)

print(result)

The output:

['hahahahaha', 'hahaha']

(?:ha){2,} - matches the sequence ha(enclosed into group (..)) literally 2 or more times {2,} but considering it as non-capturing group
(?:...) - match without capturing everything enclosed

edited Jan 21 '18 at 21:00

answered Jan 21 '18 at 20:50

RomanPerekhrest

what is ?: doing here in ` (r'((?:ha){2,})', s) ` – Webair Jan 21 '18 at 20:56
@Webair, added explanation – RomanPerekhrest Jan 21 '18 at 21:02
@RomanPerekhrest why does a non-capturing group help here? (I know, but I think it'd be good to include in the answer) – John Szakmeister Jan 21 '18 at 21:06
I am confused on one thing... when I do ` pattern="this is a joke hahahahahaaa. I cannot stop laughing hahahaaa" print(re.findall(r"((ha){2,})",pattern)) why it gives: [('hahahahaha', 'ha'), ('hahaha', 'ha')] ` the 2nd elements 'ha' ? how that is coming – Webair Jan 21 '18 at 21:07
@Webair, because in the latter case you have 2 captured groups: one with the whole match `r'(....)'` and another one with sequence `(ha)`. Visually you observe one group nested in another, but they are all considered separately when generating the results of matching – RomanPerekhrest Jan 21 '18 at 21:11
@RomanPerekhrest still a bit confused... isn't the latter case re.findall(r"((ha){2,})",pattern) , supposed to match more than two occurences of 'ha' like 'haha','hahaha','hahahaha' – Webair Jan 21 '18 at 21:18

1 Answers1