4

I need to find many substring in a string. I downloand an internet page and put it into a string. Then I've to see if the page contains some string (substring).

Now I'm using regex whit the boost library, because I use it to use the regex pattern ([0-9] ect..).

The question is: If I need only to find a substring in a string, which is the fastest way?

Carme
  • 141
  • 1
  • 9
  • Stating that the linked "duplicate" has anything of practical value for this bog standard everyday case is borderline hypocrisy... That other page is practically an academic/research discussion. It has NOTHING whatsoever to do with this question, which is essentially about just what C++ functions to call for some HTML scraping... – Sz. Jan 16 '22 at 00:22

1 Answers1

2

There are algorithms for substring searching. Here you can find comparison with example code: http://old.blog.phusion.nl/2010/12/06/efficient-substring-searching/

Boyer-Moore-Horspool wins the benchmark. https://en.wikipedia.org/wiki/Boyer–Moore–Horspool_algorithm

paweldac
  • 1,144
  • 6
  • 11
  • 4
    Link only answers are not helpful, as links can rot. Please consider posting some relevant explanation in addition to the link. – abhishek_naik Jun 05 '16 at 11:47
  • @BatCoder ok. So in general there are few algorithms that do the task from topic: Boyer-Moore, Boyer-Moore-Horspool, Turbo Boyer-Moore, Knuth-Morris-Pratt. All of them use different techniques in order to find substring. Boyer-Moore is using bad character table with good suffix table, Boyer-Moore-Horspool is using only bad character table, Turbo Boyer-Moore just takes less steps in comparison to origin Boyer-Moore, Knuth-Morris-Pratt is based on partial mach table. Comparison was won by Boyer-Moore-Horspool. And like you said there is similar stackoverflow question with brief explanation. – paweldac Jun 05 '16 at 12:03