Howto disassemble regex objects to compare performance

Asked Aug 17 '20 at 07:40

Active Aug 17 '20 at 08:29

Viewed 31 times

Given the following string

1234

1234

We can use re.search(r'^\d{4}\Z') to find only the last four numbers or re.search(r'^\d{4}(?!\[\r\n\])'). How do I get a bytecode analysis (maybe with the dis module?) to compare which one is more effective? I know I could use timeit but am really interested in the internals.

edited Aug 17 '20 at 08:29

asked Aug 17 '20 at 07:40

Jan

42,290
8
54
79

1

if you are interested in the internals of regex (as opposed to python itself), you might want to try the regex debugger feature of regex101.com (which shows you step by step how the regex is evaluated) – Lukor Aug 17 '20 at 07:44
1

@Lukor: I know but am really interested how `\Z` is translated in bytecode. – Jan Aug 17 '20 at 07:51
1

In Python, debugging a regex pattern is done using `re.DEBUG` flag, `re.search(r'^\d{4}\Z', s, re.DEBUG)`. However, `\Z` and `$(?!(?s:.))` are really very close in performance. Just `\Z` is the solution in case you do not want to match before the final newline. – Wiktor Stribiżew Aug 17 '20 at 08:20
@WiktorStribiżew: It is, I close it myself as a duplicate. – Jan Aug 17 '20 at 08:28

Howto disassemble regex objects to compare performance

0 Answers0