I have a list of regular expressions and a string. I want to know which of the regular expressions (possibly more than one) match the string, if any. The trivial solution would be to try regular expressions one by one, but this part is performance critical... Is there a faster solution? Maybe by combining regular expressions in a single state machine somehow?
Reason: regular expressions are user-supplied filters that match incoming strings. When the message with the string arrives, system needs to perform additional user-specified actions.
There are up to 10000 regular expressions available. Regular expressions are user-supplied and can be somewhat simplified, if necessary (.*
should be allowed though :) ). The list of regex-es is saved in MongoDB, but I can also prefetch them and perform the search inside Python if necessary.
Similar questions:
- similar to this question, but the constraints are different: fuzzy matching is not enough, number of regular expressions is much lower (up to 10k)
- similar to this question, but I can pre-process the regexes if necessary
- similar to this question, but I need to find all matching regular expressions
I would appreciate some help.