RegexpTrie, which wasn't around when I last looked for something like it, helps with this sort of problem:
require 'regexp_trie'
sentence = 'life on the mississippi'
words_ary = %w[the sip life]
words_regex = /\b(?:#{RegexpTrie.union(words_ary, option: Regexp::IGNORECASE).source})\b/i
# => /\b(?:(?:the|sip|life))\b/i
words_to_ints = words_ary.each_with_index.to_h
# => {"the"=>0, "sip"=>1, "life"=>2}
sentence_words = sentence.split
# => ["life", "on", "the", "mississippi"]
word_hits = sentence_words.map { |w| w[words_regex] }
# => ["life", nil, "the", nil]
nil
means there was no match of that word in the regular expression.
words_to_ints.values_at(*word_hits)
# => [2, nil, 0, nil]
Again, nil
means there was no match. nil
values could be ignored using:
word_hits = sentence_words.map { |w| w[words_regex] }.compact
# => ["life", "the"]
words_to_ints.values_at(*word_hits)
# => [2, 0]
Similarly, if you want to scan a sentence for word matches instead of individual words:
require 'regexp_trie'
sentence = 'life on the mississippi'
words = %w[the sip life]
words_regex = /\b(?:#{RegexpTrie.union(words, option: Regexp::IGNORECASE).source})\b/i
# => /\b(?:(?:the|sip|life))\b/i
words_to_ints = words.each_with_index.to_h
# => {"the"=>0, "sip"=>1, "life"=>2}
word_hits = sentence.scan(words_regex)
# => ["life", "the"]
words_to_ints.values_at(*word_hits)
# => [2, 0]
Perl has a really useful module for this sort of thing called Regexp::Assemble, which lets you combine regexes into one big one, then search a string, returning the hits. You can ask it to tell which pattern was used if you want to know.
Ruby doesn't have such a module, but this gets kinda close:
patterns = {
/(foo)/ => 1,
/(bar)/ => 2
}
pattern_union = Regexp.union(patterns.keys)
pattern_union # => /(?-mix:(foo))|(?-mix:(bar))/
str = 'foo some text'
if (pattern_union =~ str)
# these show what are being processed...
pattern_union.match(str).captures # => ["foo", nil]
pattern_union.match(str).captures.zip(patterns.keys).find_all{ |c| c[0] }.map{ |c| c[1] } # => [/(foo)/]
# process it...
matched_pattern_values = patterns.values_at(*pattern_union.match(str).captures.zip(patterns.keys).find_all{ |c| c[0] }.map{ |c| c[1] })
# here's what we got
matched_pattern_values # => [1]
end
There's probably a way to do it in one line, but this works.
I think its important to avoid having to iterate over patterns to look for hits in strings if at all possible, because they can slow down badly as the size of the text or number of patterns increase.
See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for more about using Regexp::Assemble from Ruby.