0

I have the following code. We run it on a large amount of data. However things are becoming quite slow on the more data points we run it at. Our whole code has 90% execution time at the following code.

We use it to detect if the text contains a few of our provided synonyms. In this case we are using it to make data from another data source to our data.

# returns true or false if the scan string can be found in the text
# see: http://rubular.com/r/GFX6nE9r8u
#
def scan_name_for_whole_word(text, scan_string)
  # barcelona, spain - not OK
  # barcelona the spa - OK
  # barcelona spa centers - OK
  # spa in barcelona - OK
  # spa - OK
  # Look for a whole word
  !!(text =~ /(\W#{scan_string}$|^#{scan_string}\W|\W#{scan_string}\W|^#{scan_string}$|^#{scan_string}[s])/i)
end

# Measuring this:
bm = Benchmark.measure do
  100000.times {
    scan_name_for_whole_word("barcelona the spa", "spa")
  }
end

# 2.710000   0.040000   2.750000 (  2.755910)

Questions:

  • Is this the correct approach?
  • Is there a way to make this faster?
  • If so, how could I make it faster?
Hendrik
  • 4,849
  • 7
  • 46
  • 51

0 Answers0