0

I got word = abccabcc and i need to find all repeating groups as follows:

regex = (.+)(.+)\1\2

so basically word = uv where u can be: u = abc and v can be: v = c but using python re lib findall() returns only above pair but not all possible pairs such as u = ab v = cc.

I also tried overlapped feature in regex lib but with no success.

regex = r"(.+)(.+)\1\2"
chunkRegex = re.compile(regex)
sub = chunkRegex.findall(word)
print(sub) # [('abc', 'c')]

Exapected output for given example should be and possible there are more valid matches:

[('abc', 'c'), ('ab', 'cc'), ('a', 'bcc')]

Example in online regex matcher: https://regex101.com/r/1IZUpp/1

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Anav
  • 11
  • 1
  • I tried something similar to what you sugesst but it returnes abccabcc. I think there may be something wrong dou to overlapping but i don't know how to fix it – Anav Jun 05 '20 at 07:44
  • I would reopen my question cuz its not duplicated – Anav Jun 05 '20 at 07:49
  • You should edit your question and include your exact expected output for this case, that would help make the question clear. – Thierry Lathuille Jun 05 '20 at 07:51
  • I said it can be. And later in post i said it not retuning all possible values of u and v. Only the first example. I expact my program ot return list of all possible overlapping groups. For my example it should be: (abcc c) (ab, cc) etc... – Anav Jun 05 '20 at 07:54
  • 2
    Please clean up the comments. I think the problem should not be solved with a regex. It does not work the way you expect it to. – Wiktor Stribiżew Jun 05 '20 at 08:12
  • Your question title is wrong: 're.findall()` doesn't claim to match all possible combinations of all the possible pattersn that match your regex: it matches the first pattern it can. So your question should be more like 'how do I find all possible matches to a regex'? That may be a duplicate of this https://stackoverflow.com/questions/7383818/get-all-possible-matches-for-regex-in-python - don't think it can be solved just witrh regex. – DisappointedByUnaccountableMod Jun 05 '20 at 08:14
  • @WiktorStribiżew i know how to solve this without regex but i need to use re lib unfortunately. – Anav Jun 05 '20 at 08:19
  • @barny yes you are correct i should have rephrase title cuz findall returnes all non overlapping matches – Anav Jun 05 '20 at 08:21
  • 1
    Then you are stuck forever. Regexps do not work this way, back and forth throughout the string. – Wiktor Stribiżew Jun 05 '20 at 08:22
  • @JvdV i tried but suprisingly it returnes same as standard re – Anav Jun 05 '20 at 08:24

0 Answers0