I need to match any of a list of strings, and I'm wondering if I can just use a regular expression that is something like "item1|item2|item3|..."
instead of just doing a separate strstr()
for each string. But the list can be fairly large - up to 10000 items. Would a regex work well with that? Would it be faster than searching for each string separately?
Asked
Active
Viewed 245 times
0

sashoalm
- 75,001
- 122
- 434
- 781
-
Be careful, because not every compiler has a working implementation of `
`. AFAIK, [GCC only has partial support](http://stackoverflow.com/a/15059522/1174378). – Mihai Todor Mar 19 '13 at 15:34 -
I didn't know about that. I assume I could use boost::regex on GCC though? – sashoalm Mar 19 '13 at 15:37
-
Yes, but it is not a plug & play replacement, unfortunately. – Mihai Todor Mar 19 '13 at 16:24
1 Answers
1
The regex will work and will certainly be faster than searching for each string. Though I'm not sure how much memory footprint or time will the initial setup take given the 10000 input patterns.
However, this is a well-known problem and there is a lot of specific algorithms, for example:
and several others. They all have different trade-offs, so pick your poison.
In our project we needed the multiple replace solution, so we've chosen the Aho-Corasick algorithm and have built the replacing function upon it.

Vladimir Sinenko
- 4,629
- 1
- 27
- 39