The re.DEBUG
flag offers a peek at the inner workings of a regular expression pattern in Python, for example:
import re
re.compile(r"(a(?:b)){1,3}(c)", re.DEBUG)
Returns:
MAX_REPEAT 1 3
SUBPATTERN 1 0 0
LITERAL 97
LITERAL 98
SUBPATTERN 2 0 0
LITERAL 99
0. INFO 4 0b0 3 7 (to 5)
5: REPEAT 11 1 3 (to 17)
9. MARK 0
11. LITERAL 0x61 ('a')
13. LITERAL 0x62 ('b')
15. MARK 1
17: MAX_UNTIL
18. MARK 2
20. LITERAL 0x63 ('c')
22. MARK 3
24. SUCCESS
Where can I find the meaning of the OPCODES (SUBPATTERN, MAX_REPEAT, etc.)? Some of them are self-explanatory, but the whole purpose is unclear. What does 1 0 0
means in SUBPATTERN 1 0 0
?
Some things I've tried:
- Read the docs on
re.DEBUG
- Read the source code of the parser.
- Google search
Note: I know that perhaps this is not a perfect fit for a StackOverflow question, but I've written a clear problem with an MRE and my efforts at solving the issue at hand. Moreover, I think having this solved benefits the other users as well.