Solution 1
Do this:
r = patterns.select{|pattern| content =~ pattern}
Since the string is huge, it is better to implement this method on String
rather then on something else because passing a large argument seems to be slow.
class String
def filter_patterns patterns
patterns.select{|r| self =~ pattern}
end
end
and use it like:
content.filter_patterns(patterns)
Solution 2
it has restrictions that each regex does not include a named/numbered capture.
combined_regex = Regexp.new(patterns.map{|r| "(?=[.\n]*(#{r.source}))?"}.join)
content =~ combined_regex
The following part will have problem if the regex inside patterns
include a named/numbered capture. If there is a way to know for each regex how many potential captures there are, then it will solve the problem.
r = patterns.select.with_index{|pattern, i| Regexp.last_match[i]}
Addition
Given:
dogs = {
'saluki' => 'Hounds',
'russian wolfhound' => 'Hounds',
'italian greyhound' => 'Hounds',
..
}
content = "Running in the fields at great speeds, the sleek saluki dog comes from..."
you can do this:
combined_regex =
Regexp.new(dogs.keys.map{|w| "(?=[.\n]*(#{w}))?"}.join, Regexp::IGNORECASE)
content =~ combined_regex
r = patterns.select.with_index{|pattern, i| Regexp.last_match[i]}
"This article talks about #{r.collect{|x| dogs[x]}.to_sentence}."
=> "This article talks about Hounds."
To avoid outputs like This article talks about Hounds, Hounds and Hounds.
, you might want to put uniq
in it.
"This article talks about #{r.uniq.collect{|x| dogs[x]}.to_sentence}."