1

I have a list of strings which are patterns with AND and OR operators and wildcards. Now given an input string, return true if it matches any patterns, and false if it does not.

Say, I have 'n' patterns and a query of length 'm' Now, the obvious way is to run a loop and grep for each pattern in the string. This takes O(nm) time.

Now, my question is, is it possible to do better? I was thinking some sort of expression evaluation Finite State Machine maybe? Is there a name/reference implementation of something like that?

Thanks

  • Beware that modern CPUs are blindingly fast looping over linear data structures (especially when the loop fits in on-chip memory and the branches can be predicted) and much slower following pointers and going off-chip for memory access. Whatever you try you should benchmark it against a dumb, brute-force algorithm. – Ian Mercer Nov 15 '16 at 23:04
  • You could certainly create a finite state machine that handles the merged search from all of your patterns. See http://stackoverflow.com/questions/525004/short-example-of-regular-expression-converted-to-a-state-machine for a discussion on turning RegEx into FSM. – Ian Mercer Nov 15 '16 at 23:07

1 Answers1

0

You are looking for the Boyer–Moore string search algorithm.

You might also be able to get good results if you first parse your patterns and build Abstract Syntax Trees out of them, then also parse your query string into another abstract syntax tree, then use a node search (for the root) and tree comparison algorithm (rather trivial) to see if any of your pattern strings is found within your query string. In theory, the parsing of the query string can be done in O(n), but in practice I doubt that it will result in any better performance. It might be an interesting exercise though.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • 1
    I think you've misunderstood the OP (or maybe I have); I think the OP *is* using "pattern" to refer to what is being searched for. I think there are many strings-to-search-for, and only one string-to-search-in. (Also, the "patterns" seem to be much more complicated than simple substrings.) – ruakh Nov 15 '16 at 23:05