-1

I have scoured the web (and perhaps I am searching the wrong thing), but I have a very long regex pattern that I would like to match:

Ex:

import re

re_pattern_str = r"I want to match this \(this is an example\) regular expression to a giant string"

sample_paragraph = "I want to match this (this is an example) regular expression to a giant string. This is a huge paragraph with a bunch of stuff in it"

print(re.match(re_pattern_str, sample_paragraph))

The output of the above program is as follows:

run

<re.Match object; span=(0, 78), match='I want to match this (this is an example) regular>

As you can see, it gets cut off and doesn't capture the whole string.

Also, I noticed that using verbose mode with a lot of comments ((?x) in Python) captures less. Does this mean there is a limit to how much can be captured? I also noticed using different Python versions and different machines caused different amounts of a long regex string to be captured. I still can't pinpoint if this is an issue in the re library in Python, a Python 3 specific thing (I haven't compared this to Python 2), a machine issue, memory issue, or something else.

I have used Python 3.8.1 for the above example, and have used Python 3.7.2 for another example using verbose regexes and other examples (I can't share these examples since those are proprietary).

Any help on the mechanics of Python regex engine and why this happens (and if there is a maximum length that can be captured via regex, why?), this would be very helpful.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
qxzsilver
  • 522
  • 1
  • 6
  • 21
  • Why do you think it would match more of the string? Your pattern doesn't match the whole string. I suspect your verbose behavior is because you changed something else by accident. Point is, there is no length limit you're likely to hit with any string you can show us here (*maybe* it might have problems matching up in the multi-GB range, but I have no evidence for this). – ShadowRanger Mar 17 '20 at 18:37
  • 1
    Your regex contains no optional elements, and is therefore incapable of matching anything less than the full pattern. The `__repr__()` of the match object is evidently trimming the display, presumably to avoid overwhelming your terminal in the case of a *truly* long match. – jasonharper Mar 17 '20 at 18:45
  • `m.group(0) 'I want to match this (this is an example) regular expression to a giant string'` <== that *does* match the entire expression. The `repr()` of the match object isn't what you should go by. If you want to capture *all* of any string that begins with the expression, then put a `.*` at the end of the expression. – Todd Mar 17 '20 at 21:38
  • A duplicate of [Python extract pattern matches](https://stackoverflow.com/questions/15340582/python-extract-pattern-matches). And of [How do I return a string from a regex match in python?](https://stackoverflow.com/q/18493677/3832970) and many more. – Wiktor Stribiżew Mar 17 '20 at 22:43

1 Answers1

-1

You think the repr of the match is the matched text. It isn't. The repr tries not to dump pages of text for large matches. If you want to see the complete matched text, index in to get it as a string:

print(re.match(re_pattern_str, sample_paragraph)[0])
                                               #^^^ gets the matched text itself

You can see from the repr it's a much longer match (it spans index 0 to 78).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Please consider closing the evident duplicate question rather than re-answering suchlike questions again and again. This question is asked too often, and the answer is short and always the same. It is a duplicate of [Python extract pattern matches](https://stackoverflow.com/questions/15340582/python-extract-pattern-matches). And of [How do I return a string from a regex match in python?](https://stackoverflow.com/q/18493677/3832970) and many more. – Wiktor Stribiżew Mar 17 '20 at 22:45
  • @WiktorStribiżew: I do, when I remember a duplicate exists. This one didn't have an obvious duplicate, or at least I couldn't remember the query to find it. – ShadowRanger Mar 17 '20 at 23:57