-1

I am trying to match more than 2 occurrences of haha in the following code. But () seems to be working as grouping. Why isn't it working?

>>> pattern="this is a joke hahahahahaaa. I cannot stop laughing hahahaaa"
>>> print(re.findall("(ha){2,}",pattern))
['ha', 'ha']

I wanted results to be:

['hahahaha', 'hahaha']

How do I fix it?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Webair
  • 95
  • 1
  • 6

1 Answers1

2
import re

s = '"this is a joke hahahahahaaa. I cannot stop laughing hahahaaa"'
result = re.findall(r'(?:ha){2,}', s)

print(result)

The output:

['hahahahaha', 'hahaha']

  • (?:ha){2,} - matches the sequence ha(enclosed into group (..)) literally 2 or more times {2,} but considering it as non-capturing group

  • (?:...) - match without capturing everything enclosed

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • what is ?: doing here in ` (r'((?:ha){2,})', s) ` – Webair Jan 21 '18 at 20:56
  • @Webair, added explanation – RomanPerekhrest Jan 21 '18 at 21:02
  • @RomanPerekhrest why does a non-capturing group help here? (I know, but I think it'd be good to include in the answer) – John Szakmeister Jan 21 '18 at 21:06
  • I am confused on one thing... when I do ` pattern="this is a joke hahahahahaaa. I cannot stop laughing hahahaaa" print(re.findall(r"((ha){2,})",pattern)) why it gives: [('hahahahaha', 'ha'), ('hahaha', 'ha')] ` the 2nd elements 'ha' ? how that is coming – Webair Jan 21 '18 at 21:07
  • @Webair, because in the latter case you have 2 captured groups: one with the whole match `r'(....)'` and another one with sequence `(ha)`. Visually you observe one group nested in another, but they are all considered separately when generating the results of matching – RomanPerekhrest Jan 21 '18 at 21:11
  • @RomanPerekhrest still a bit confused... isn't the latter case re.findall(r"((ha){2,})",pattern) , supposed to match more than two occurences of 'ha' like 'haha','hahaha','hahahaha' – Webair Jan 21 '18 at 21:18