I'm working on a Ruby project in which the user can configure a list of regular expressions and associate each expression with some action to be performed. This list of expressions is then tested against some given input string, and the handler(s) of the matching expression(s) will be executed. The code basically looks like this:
@mapping = {
/^[A-Za-z]+$/ => Proc.new { puts "Got word" },
/^\d+$/ => Proc.new { puts "Got number" },
/^.$/ => Proc.new { puts "Got single character" }
}
def run_handlers(s)
@mapping.each { |regexp, handler| handler.call if regexp =~ s.to_s }
end
# Prints 'Got word'
run_handlers("StackOverflow")
# Prints 'Got number' as well as 'Got single character'
run_handlers(4)
Alas, this performs rather poorly when there are many (a couple thousand) handlers. Performing a linear search over all handlers seems a bit wasteful, but since the expressions are user-configurable I cannot exploit some specific traits of the expressions easily. I wonder whether there's a more efficient way to do this.
One idea I was toying with is to merge multiple regular expressions (i.e. multiple state machines) into one big state machine which is able to run more efficiently. I'm aware of the union
method to merge regular expressions, but since I need to execute a different set of handlers depending on which expression match, that does not seem to be of any use.
I was thinking that maybe the expressions can internally be managed like a tree structure (much like a Trie such that the program does not actually need to try all the regular expressions but can quickly drop entire sets of expressions.
Maybe there is some existing code for this kind of problem already? I suppose various web frameworks deal with the same kind of issue when getting a request and having to decide which route to select.