Browse and extract character classes from a regular expression

Question

So the problem is a bit reversed: I have a regular expression and would like to extract possibilities from it. I don't have a string yet, I just want to know what would match. An example could be:

import re
license = re.compile("^[0-9]{3}[A-Z][0-9]{3}$")

I know that when using re.DEBUG, the list of character classes is displayed in order. Static characters are shown as well. This would be just what I'd like to get, a list of objects representing "parts" of my regular expression. The first would represent the beginning of the string, the next would represent a character class including from 0 to 9 and repeated three times, and so on.

Is that possible at all using regular expressions? I know it's supposed to be the other way around.

Thanks for your help,

See [Reversing a regular expression in Python](https://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python) — Wiktor Stribiżew, Jul 17 '17 at 10:11
I've mostly used `itertools.product` with some hand-made part lists to do this. Thanks @WiktorStribiżew for the link, very useful. — Oleksii Filonenko, Jul 17 '17 at 10:26

score 0 · Answer 1 · answered Jul 17 '17 at 10:32

This was very helpful, thank you. So if anyone needs it, I'll post what I found. The way to do this is to explore the regular expression. This is not straight-forward, it needs a bit of additional coding, but if your use case is simple (like mine), you don't have to worry too much about branching or other advanced features.

>>> re.sre_parse.parse("^[0-9]{3}[A-Z][0-9]{3}$").data
[('at', 'at_beginning'), ('max_repeat', (3, 3, [('in', [('range', (48, 57))])])), ('in', [('range', (65, 90))]), ('max_repeat', (3, 3, [('in', [('range', (48, 57))])])), ('at', 'at_end')]
>>>

So we have the information here, we just need to handle them as neatly as possible. Some third-party libraries allow random generation with basic regular expressions, but this would work out-of-the-box in Python (probably very old versions).

Browse and extract character classes from a regular expression

1 Answers1