I have the following code. We run it on a large amount of data. However things are becoming quite slow on the more data points we run it at. Our whole code has 90% execution time at the following code.
We use it to detect if the text contains a few of our provided synonyms. In this case we are using it to make data from another data source to our data.
# returns true or false if the scan string can be found in the text
# see: http://rubular.com/r/GFX6nE9r8u
#
def scan_name_for_whole_word(text, scan_string)
# barcelona, spain - not OK
# barcelona the spa - OK
# barcelona spa centers - OK
# spa in barcelona - OK
# spa - OK
# Look for a whole word
!!(text =~ /(\W#{scan_string}$|^#{scan_string}\W|\W#{scan_string}\W|^#{scan_string}$|^#{scan_string}[s])/i)
end
# Measuring this:
bm = Benchmark.measure do
100000.times {
scan_name_for_whole_word("barcelona the spa", "spa")
}
end
# 2.710000 0.040000 2.750000 ( 2.755910)
Questions:
- Is this the correct approach?
- Is there a way to make this faster?
- If so, how could I make it faster?