I have scoured the web (and perhaps I am searching the wrong thing), but I have a very long regex pattern that I would like to match:
Ex:
import re
re_pattern_str = r"I want to match this \(this is an example\) regular expression to a giant string"
sample_paragraph = "I want to match this (this is an example) regular expression to a giant string. This is a huge paragraph with a bunch of stuff in it"
print(re.match(re_pattern_str, sample_paragraph))
The output of the above program is as follows:
run
<re.Match object; span=(0, 78), match='I want to match this (this is an example) regular>
As you can see, it gets cut off and doesn't capture the whole string.
Also, I noticed that using verbose mode with a lot of comments ((?x)
in Python) captures less. Does this mean there is a limit to how much can be captured? I also noticed using different Python versions and different machines caused different amounts of a long regex string to be captured. I still can't pinpoint if this is an issue in the re
library in Python, a Python 3 specific thing (I haven't compared this to Python 2), a machine issue, memory issue, or something else.
I have used Python 3.8.1 for the above example, and have used Python 3.7.2 for another example using verbose regexes and other examples (I can't share these examples since those are proprietary).
Any help on the mechanics of Python regex engine and why this happens (and if there is a maximum length that can be captured via regex, why?), this would be very helpful.