Faster alternative than a regex? Maybe also the regex approach wrong

Question

I have the following code. We run it on a large amount of data. However things are becoming quite slow on the more data points we run it at. Our whole code has 90% execution time at the following code.

We use it to detect if the text contains a few of our provided synonyms. In this case we are using it to make data from another data source to our data.

# returns true or false if the scan string can be found in the text
# see: http://rubular.com/r/GFX6nE9r8u
#
def scan_name_for_whole_word(text, scan_string)
  # barcelona, spain - not OK
  # barcelona the spa - OK
  # barcelona spa centers - OK
  # spa in barcelona - OK
  # spa - OK
  # Look for a whole word
  !!(text =~ /(\W#{scan_string}$|^#{scan_string}\W|\W#{scan_string}\W|^#{scan_string}$|^#{scan_string}[s])/i)
end

# Measuring this:
bm = Benchmark.measure do
  100000.times {
    scan_name_for_whole_word("barcelona the spa", "spa")
  }
end

# 2.710000   0.040000   2.750000 (  2.755910)

Questions:

Is this the correct approach?
Is there a way to make this faster?
If so, how could I make it faster?

A whole word can be checked with `\b`s around the search term. `\b` is a word boundary. Try `/\b#{scan_string}\b/i`. — Wiktor Stribiżew, Dec 16 '15 at 13:34
Almost the same regex is present in the accepted answer in the original question. — Wiktor Stribiżew, Dec 16 '15 at 13:39
How is that in terms of performance? I am looking for a solution that is faster than the one I have right now. — Hendrik, Dec 16 '15 at 13:41
True, just exploring the options. Maybe a simple index for the search string would also be faster. — Hendrik, Dec 16 '15 at 13:53
But you need a whole word search. A regex is meant to help right in these situations. — Wiktor Stribiżew, Dec 16 '15 at 14:07

Faster alternative than a regex? Maybe also the regex approach wrong

0 Answers0