1

I am trying to find a regex that matches if the string contains two different characters or groups which are repeated exactly the same number of times, where that number is unspecified, it can be anything reasonable within the range of positive numbers. So, for instance, this regex should match 'aabbcc', but not aabbccc. It should match only if both a and c are repeated the same number of times.

Obviously ,if I try 'a+[^ac]*c+, it will match if the string contains any number of repetitions of a and c, starting from one. If I needed both of the characters to be repeated a specific number of time, then 'a{n}[^ac]*c{n}' could work, where n represents the number of repetitions. but neither of them works for me, because I need this regex to match only if both of them are repeated exactly the same number of times, where the number of repetitions isn't specified. Thanks.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Learner 0
  • 21
  • 3
  • 2
    You can't achieve it with plain `re` regex. – Wiktor Stribiżew Apr 06 '22 at 10:38
  • 2
    Have you checked this out https://stackoverflow.com/questions/64623841/regex-to-get-the-size-of-capturing-group – Harshal Pawar Apr 06 '22 at 10:44
  • You could set the quantifier for the backreference dynamically and assemble the pattern using code `^(?:(.)\1{2})+$` https://regex101.com/r/1EHptp/1 If the parts can not be the same next to each other `^(?:(.)\1{1}(?!\1))+$` https://regex101.com/r/nlTQYu/1 – The fourth bird Apr 06 '22 at 10:46

1 Answers1

0

A possible way to bypass the limitations (or just mines!) of the regex syntax of re is to match all repeated sequences of r'(a+)(b+)(c+)' and filter them by a simultaneous check of the lengths.

import re

s = '-----aabbcccjllkd   aaabbbccc, aabc aaaabbbbcccc'
regex = r'(?P<a>a+)(?P<b>b+)(?P<c>c+)'

f = re.finditer(regex, s)

print(list(''.join(m.groups()) for m in f if all(map(lambda p: p[0] == p[1], zip(tmp:=list(map(len, m.groups())), tmp[1:])))))

Output

['aaabbbccc', 'aaaabbbbcccc']

Reamarks:

  • := is "sensible" inside list comprehension and raise a SyntaxError: assignment expression cannot be used in a comprehension iterable expression. This can be bypassed using for example map.

  • using itertools it can be prettified a bit, like all(it.starmap(int.__eq__, zip(tmp:=list(map(len, m.groups())) with ìmport itertools as it

cards
  • 3,936
  • 1
  • 7
  • 25