Consider this:
File.readlines('words.txt').map do |word|
array_of_words << word
end
will read the entire file into memory, then convert it into individual elements in an array. You could accomplish the same thing using:
array_of_words = File.readlines('words.txt')
A potential problem is its not scalable. If "words.txt" is larger than the available memory your code will have problems so be careful.
Searching a file for an array of words can be done a number of ways, but I've always found it easiest to use a regular expression. Perl has a great module called Regexp::Assemble that makes it easy to convert a list of words into a very efficient pattern, but Ruby is missing that sort of functionality. See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for one solution I put together in the past to help with that.
Ruby does have Regexp.union
however it's only a partial help.
words = %w(foo bar)
re = Regexp.union(words) # => /foo|bar/
The pattern generated has flags for the expression so you have to be careful with interpolating it into another pattern:
/#{re}/ # => /(?-mix:foo|bar)/
(?-mix:
will cause you problems so don't do that. Instead use:
/#{re.source}/ # => /foo|bar/
which will generate the pattern and behave like we expect.
Unfortunately, that's not a complete solution either, because the words could be found as sub-strings in other words:
'foolish'[/#{re.source}/] # => "foo"
The way to work around that is to set word-boundaries around the pattern:
/\b(?:#{re.source})\b/ # => /\b(?:foo|bar)\b/
which then look for whole words:
'foolish'[/\b(?:#{re.source})\b/] # => nil
More information is available in Ruby's Regexp documentation.
Once you have a pattern you want to use then it becomes a simpler matter to search. Ruby has the Find class, which makes it easy to recursively search directories for files. The documentation covers how to use it.
Alternately, you can cobble your own method using the Dir class. Again, it has examples in the documentation to use it, but I usually go with Find.
When reading the files you're scanning I'd recommend using foreach
to read the files line-by-line. File.read
and File.readlines
are not scalable and can make your program behave erratically as Ruby tries to read a big file into memory. Instead, foreach
will result in very scalable code that runs more quickly. See "Why is "slurping" a file not a good practice?" for more information.
Using the links above you should be able to put something together quickly that'll run efficiently and be flexible.
This untested code should get you started:
WORD_ARRAY = File.readlines('words.txt').map(&:chomp)
WORD_RE = /\b(?:#{Regexp.union(WORD_ARRAY).source}\b)/
Dir['Source/**/*'].select{|f| File.file?(f) }.each do |filepath|
puts "#{filepath}: #{!!File.read(filepath)[WORD_RE]}"
end
It will output the file it's reading, and "true" or "false" whether there is a hit finding one of the words in the list.
It's not scalable because of readlines
and read
and could suffer serious slowdown if any of the files are huge. Again, see the caveats in the "slurp" link above.