1

I tried to follow this answer but it doesn't work when there are multiple occurrences of the same word.

I want to count the occurrence of both site and site web in the string "site web site".

I tried with the following code :

var regex = /(?:\b)((?=(site))(?=(site web)))(?:\b)/;
var string = 'site web site';
var match = string.match( regex ).filter(Boolean);

console.log(match)

This code returns ["site", "site web"] but I want it to return ["site", "site", "site web"] since site appears two times in the string.

Note : In my case, I have hundreds of words to match.

More, if the input is site webS site, the expected output is ["site", "site"]. The input is supposed to be a complete text with punctuations to take into account (.,?!/;...).

Valentin Duboscq
  • 970
  • 8
  • 20

1 Answers1

0

If you need to find if the words of a list of words appears in a string and you have hundred of words, you need to use a good string searching algorithm. I think for your use case the best option is Aho Corasick algorithm. It achieves near to O(n) complexity which is much faster than using regexp.

Check this link:

Aho Corasick

I used it and I can say you it works very fast. If you decide to use it, there are available in github multiple implementations of this algorithm. You can search a good one for your programming language and use it.

I hope this helps.

David Zamora
  • 383
  • 1
  • 4
  • 15