0

For sake of an example, I have the following string:

"Federal INSURANCE Mortgage"

I want to check if any word in this string is present in the following array:

BAD_WORDS = %w{LLC CORPORATION INSURANCE COMPANY GEICO}

In our example, INSURANCE is present. So it should return true. This is what I have done:

BAD_WORDS = %w{LLC CORPORATION INSURANCE COMPANY GEICO}
BAD_WORDS.detect {|word| "Federal INSURANCE Mortgage".index(word) }.present?
=> true 
BAD_WORDS.detect {|word| "Federal Mortgage".index(word) }.present?
=> false

Is this the most proficient way to solve this problem in Ruby?

Daniel Viglione
  • 8,014
  • 9
  • 67
  • 101

2 Answers2

2

This is best done with a regular expression, and assembling one for this task is easy with Regexp.union:

BAD_WORDS = %w{LLC CORPORATION INSURANCE COMPANY GEICO}
BAD_WORDS_RX = Regexp.union(*BAD_WORDS)

"Federal INSURANCE Mortgage".match(BAD_WORDS_RX)
# => #<MatchData "INSURANCE">

Now this will also do partial word matches, which might be undesirable, but the words in your example are all quite unique.

Your approach involves iterating over the words and additionally iterating over an array of bad words. This is N*M complexity, or in other words, it's geometrically slow. As your strings become longer or the bad list gets bigger it will get painfully expensive.

A regular expression is very low cost once created, and the creation cost is nominal.

One minor improvement to your original is to use a Set instead of an Array. These have constant lookup time.

tadman
  • 208,517
  • 23
  • 234
  • 262
1

Both your way and the answer (that is now deleted) will loop through the input, and loop through words for each input word making the run time O(n2). If you have a large input and lots of words that could get slow.

The ruby array intersection method uses a hash under the covers, so it can do the same job in O(n).

("Federal INSURANCE Mortgage".split & BAD_WORDS).any?

See here: Computing set intersection in linear time?

Community
  • 1
  • 1
Ehren Murdick
  • 426
  • 3
  • 8