There are multiple ways to do this. This is untested, but here's the gist of how I'd go about it:
require 'find'
REPLACEMENTS = {
'target1' => 'replacement1',
'target2' => 'replacement2'
}
Find.find('./somedir') do |path|
next if File.directory?(path)
next unless File.extname(path) == '.feature'
new_path = path + '.new'
File.open(new_path, 'w') do |fo|
File.foreach(path) do |line|
REPLACEMENTS.each do |t, r|
line.gsub!(t, r)
end
fo.puts line
end
end
old_path = path + '.old'
File.rename(path, old_path)
File.rename(new_path, path)
# File.unlink(old_path)
end
There are several things I consider important when writing production code in an enterprise:
- Write code that is scalable. This means that it won't die or go to a crawl if a file is much larger than you expect.
File.foreach
reads files line-by-line, which a lot of people assume means it runs more slowly or involves more work. Testing File I/O, specifically slurping files vs. line-by-line shows that as the file sizes grow, read
(AKA slurping) slows drastically. Unless you are absolutely, totally sure, that your files will never get over 1MB, use foreach
.
- Use a nice starting structure to contain your target words and their replacements. A Hash is a good starting point as the targets/keys can be sub-strings or regular expressions. Explaining regular expressions is off-topic for this question but Ruby and
gsub
and regular expressions as keys in a hash are a great combination. There are lots of answers here showing how to do it plus the documentation has examples.
Find.find
isn't well known; People tend to jump to Dir[...]
or Dir.glob
but I prefer Find. The documentation has a nice example of how to use it. Find seems to scale better, especially when you have to walk huge directories.
- Modifying files then saving them is usually not done safely because people assume their code or system will never act up. That's not a good assumption. This code opens a new file, then reads the old line-by-line. For each line the targets are searched for and replaced, then the line is written to the new file. Once the old file is processed the new file is closed (as a by-product of using a block with
open
). Then the old file is renamed, the new file is renamed to the name of the old file. That leaves a backup in place in case there was a failure. Then, optionally, you could delete the old file.
This results in very low overhead, will run very fast and should scale/grow nicely.
This task can also be easily accomplished using the command-line tools find
and sed
in *nix, and it'll run very fast and be extremely scalable, so it's good to research that path too as it's surprising how easily some file tasks can be done at the prompt or in a shell script.