I have a situation where I have thousands of Python regexes that look something like:
r'...(?:uuid)...'
Where ... isn't literally three regex wildcards, but rather represents long strings with slow compilation time.
The strings these regexes are matching may contain multiple uuids from the same set of uuids, but for each string there is only one specific uuid that is supposed to match; this is why there is currently a different regex for each string. For example, if the string is like:
'...uuid1...uuid2...uuid3...'
We might be looking for only uuid2, so it's not a simple matter of just writing a regex like:
r'...(?:uuid1|uuid2|uuid3)...'
The obvious solution would be to start by compiling these regexes in separate processes, and returning the compiled regexes as pickled objects, but it looks like Python doesn't really support that.
One solution that might work would be splitting this into 3 regexes:
r'...'
r'(?:uuid)'
r'...'
There would still be thousands of regexes to compile, but at least they would each compile somewhat faster. However I'd have to write a bunch of difficult-to-understand functions to implement the matching
A second solution might be doing something like:
r'...(uuid1|uuid2|uuid3)...'
But then using finditer and for each match, checking to see if it contains the correct uuid, and if not just going to the next match. I don't really like this solution either, but it doesn't seem completely unreasonable to try it and measure the performance.
Am I missing a better way of doing this? All of these solutions seem frankly not that good.