135

I'm looking for a script to search a file (or list of files) for a pattern and, if found, replace that pattern with a given value.

Thoughts?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Dane O'Connor
  • 75,180
  • 37
  • 119
  • 173
  • 1
    _In the answers below, be aware that any recommendations to use `File.read` need to be tempered with the information in http://stackoverflow.com/a/25189286/128421 for why slurping big files is bad. Also, instead of `File.open(filename, "w") { |file| file << content }` variations use `File.write(filename, content)`._ – the Tin Man May 12 '17 at 18:06

11 Answers11

208

Disclaimer: This approach is a naive illustration of Ruby's capabilities, and not a production-grade solution for replacing strings in files. It's prone to various failure scenarios, such as data loss in case of a crash, interrupt, or disk being full. This code is not fit for anything beyond a quick one-off script where all the data is backed up. For that reason, do NOT copy this code into your programs.

Here's a quick short way to do it.

file_names = ['foo.txt', 'bar.txt']

file_names.each do |file_name|
  text = File.read(file_name)
  new_contents = text.gsub(/search_regexp/, "replacement string")

  # To merely print the contents of the file, use:
  puts new_contents

  # To write changes to the file, use:
  File.open(file_name, "w") {|file| file.puts new_contents }
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Max Chernyak
  • 37,015
  • 6
  • 38
  • 43
  • Does puts write the change back out to the file? I thought that would just print the content to the console. – Dane O'Connor Aug 13 '09 at 21:26
  • Yes, it prints the content to the console. – sepp2k Aug 13 '09 at 21:35
  • 7
    Yes, I wasn't sure that's what you wanted. To write use File.open(file_name, "w") {|file| file.puts output_of_gsub} – Max Chernyak Aug 13 '09 at 21:36
  • 7
    I had to use file.write: File.open(file_name, "w") {|file| file.write(text) } – austen Apr 13 '12 at 23:11
  • 3
    To write file, replace puts' line with `File.write(file_name, text.gsub(/regexp/, "replace")` – tight Mar 06 '14 at 16:33
  • What does the `"w"` mean? – micnguyen Feb 26 '18 at 04:49
  • @micnguyen The `"w"` means "write-only". It's an [open mode](https://ruby-doc.org/core-2.1.4/IO.html#method-c-new-label-IO+Open+Mode) that tells the operating system what kind of file handle you want to get based on your intention. In `"w"` mode specifically you're telling the OS that you want to write to this file, and it will start writing from the beginning, truncating any existing content. Under the hood these open modes become flags passed into the [open system call](https://en.wikipedia.org/wiki/Open_(system_call)). – Max Chernyak May 06 '18 at 00:20
  • I think I'm going to thrown down here that nobody should be using this pattern and it should not be the accepted answer. If the process is interrupted (ctrl-c? OOM killer? system crash? cosmic rays?) during the write then you will create a file with truncated data. At a minimum, programmers should always write to a tempfile and then move/rename the file. I just ran into this bug as a user where I ctrl-c'd a command line task and a json file the process was writing became irreversibly truncated with data loss + corruption and I had to burn an hour to rebuild it from scratch. – lamont May 07 '19 at 05:24
  • @lamont Good call, added a disclaimer on top. – Max Chernyak Jun 27 '19 at 03:47
111

Actually, Ruby does have an in-place editing feature. Like Perl, you can say

ruby -pi.bak -e "gsub(/oldtext/, 'newtext')" *.txt

This will apply the code in double-quotes to all files in the current directory whose names end with ".txt". Backup copies of edited files will be created with a ".bak" extension ("foobar.txt.bak" I think).

NOTE: this does not appear to work for multiline searches. For those, you have to do it the other less pretty way, with a wrapper script around the regex.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jim Kane
  • 1,264
  • 1
  • 8
  • 8
  • 1
    What the heck is pi.bak? Without that, I get an error. -e:1:in `
    ': undefined method `gsub' for main:Object (NoMethodError)
    – Ninad Aug 15 '11 at 19:25
  • 16
    @NinadPachpute `-i` edits in place. `.bak` is the extension used for a backup file (optional). `-p` is something like `while gets; – Lri Sep 13 '11 at 07:26
  • 1
    This is a better answer than the accepted answer, IMHO, if you're looking to modify the file. – Colin K Oct 19 '11 at 23:26
  • I used this as a starting point to change the case of all occurrences of a regex in a file: `jruby -pi.bak -e "$_.gsub!(/oldtext/){|x| x.upcase}" *.txt` – Colin K Oct 19 '11 at 23:32
  • Or, simpler yet: `jruby -pi.bak -e "gsub(/oldtext/){|x| x.upcase}" *.txt` – Colin K Oct 19 '11 at 23:35
  • 6
    How can I use this inside a ruby script?? – Saurabh Jun 05 '15 at 06:28
  • how can I write the new file with the substituted value to a different file name from the origin (the one with oldtext remains the same, new file has newtext – Satchel Jun 08 '16 at 00:05
  • 1
    There are a lot of ways this can go wrong so test it thoroughly before attempting it against a critical file. – the Tin Man Dec 06 '19 at 20:04
56

Keep in mind that, when you do this, the filesystem could be out of space and you may create a zero-length file. This is catastrophic if you're doing something like writing out /etc/passwd files as part of system configuration management.

Note that in-place file editing like in the accepted answer will always truncate the file and write out the new file sequentially. There will always be a race condition where concurrent readers will see a truncated file. If the process is aborted for any reason (ctrl-c, OOM killer, system crash, power outage, etc) during the write then the truncated file will also be left over, which can be catastrophic. This is the kind of dataloss scenario which developers MUST consider because it will happen. For that reason, I think the accepted answer should most likely not be the accepted answer. At a bare minimum write to a tempfile and move/rename the file into place like the "simple" solution at the end of this answer.

You need to use an algorithm that:

  1. Reads the old file and writes out to the new file. (You need to be careful about slurping entire files into memory).

  2. Explicitly closes the new temporary file, which is where you may throw an exception because the file buffers cannot be written to disk because there is no space. (Catch this and cleanup the temporary file if you like, but you need to rethrow something or fail fairly hard at this point.

  3. Fixes the file permissions and modes on the new file.

  4. Renames the new file and drops it into place.

With ext3 filesystems you are guaranteed that the metadata write to move the file into place will not get rearranged by the filesystem and written before the data buffers for the new file are written, so this should either succeed or fail. The ext4 filesystem has also been patched to support this kind of behavior. If you are very paranoid you should call the fdatasync() system call as a step 3.5 before moving the file into place.

Regardless of language, this is best practice. In languages where calling close() does not throw an exception (Perl or C) you must explicitly check the return of close() and throw an exception if it fails.

The suggestion above to simply slurp the file into memory, manipulate it and write it out to the file will be guaranteed to produce zero-length files on a full filesystem. You need to always use FileUtils.mv to move a fully-written temporary file into place.

A final consideration is the placement of the temporary file. If you open a file in /tmp then you have to consider a few problems:

  • If /tmp is mounted on a different file system you may run /tmp out of space before you've written out the file that would otherwise be deployable to the destination of the old file.

  • Probably more importantly, when you try to mv the file across a device mount you will transparently get converted to cp behavior. The old file will be opened, the old files inode will be preserved and reopened and the file contents will be copied. This is most likely not what you want, and you may run into "text file busy" errors if you try to edit the contents of a running file. This also defeats the purpose of using the filesystem mv commands and you may run the destination filesystem out of space with only a partially written file.

    This also has nothing to do with Ruby's implementation. The system mv and cp commands behave similarly.

What is more preferable is to open a Tempfile in the same directory as the old file. This ensures that there will be no cross-device move issues. The mv itself should never fail, and you should always get a complete and untruncated file. Any failures, such as device out of space, permission errors, etc., should be encountered during writing the Tempfile out.

The only downsides to the approach of creating the Tempfile in the destination directory are:

  • Sometimes you may not be able to open a Tempfile there, such as if you are trying to 'edit' a file in /proc for example. For that reason you might want to fall back and try /tmp if opening the file in the destination directory fails.
  • You must have enough space on the destination partition in order to hold both the complete old file and the new file. However, if you have insufficient space to hold both copies then you are probably short on disk space and the actual risk of writing a truncated file is much higher, so I would argue this is a very poor tradeoff outside of some exceedingly narrow (and well-monitored) edge cases.

Here's some code that implements the full-algorithm (windows code is untested and unfinished):

#!/usr/bin/env ruby

require 'tempfile'

def file_edit(filename, regexp, replacement)
  tempdir = File.dirname(filename)
  tempprefix = File.basename(filename)
  tempprefix.prepend('.') unless RUBY_PLATFORM =~ /mswin|mingw|windows/
  tempfile =
    begin
      Tempfile.new(tempprefix, tempdir)
    rescue
      Tempfile.new(tempprefix)
    end
  File.open(filename).each do |line|
    tempfile.puts line.gsub(regexp, replacement)
  end
  tempfile.fdatasync unless RUBY_PLATFORM =~ /mswin|mingw|windows/
  tempfile.close
  unless RUBY_PLATFORM =~ /mswin|mingw|windows/
    stat = File.stat(filename)
    FileUtils.chown stat.uid, stat.gid, tempfile.path
    FileUtils.chmod stat.mode, tempfile.path
  else
    # FIXME: apply perms on windows
  end
  FileUtils.mv tempfile.path, filename
end

file_edit('/tmp/foo', /foo/, "baz")

And here is a slightly tighter version that doesn't worry about every possible edge case (if you are on Unix and don't care about writing to /proc):

#!/usr/bin/env ruby

require 'tempfile'

def file_edit(filename, regexp, replacement)
  Tempfile.open(".#{File.basename(filename)}", File.dirname(filename)) do |tempfile|
    File.open(filename).each do |line|
      tempfile.puts line.gsub(regexp, replacement)
    end
    tempfile.fdatasync
    tempfile.close
    stat = File.stat(filename)
    FileUtils.chown stat.uid, stat.gid, tempfile.path
    FileUtils.chmod stat.mode, tempfile.path
    FileUtils.mv tempfile.path, filename
  end
end

file_edit('/tmp/foo', /foo/, "baz")

The really simple use-case, for when you don't care about file system permissions (either you're not running as root, or you're running as root and the file is root owned):

#!/usr/bin/env ruby

require 'tempfile'

def file_edit(filename, regexp, replacement)
  Tempfile.open(".#{File.basename(filename)}", File.dirname(filename)) do |tempfile|
    File.open(filename).each do |line|
      tempfile.puts line.gsub(regexp, replacement)
    end
    tempfile.close
    FileUtils.mv tempfile.path, filename
  end
end

file_edit('/tmp/foo', /foo/, "baz")

TL;DR: That should be used instead of the accepted answer at a minimum, in all cases, in order to ensure the update is atomic and concurrent readers will not see truncated files. As I mentioned above, creating the Tempfile in the same directory as the edited file is important here to avoid cross device mv operations being translated into cp operations if /tmp is mounted on a different device. Calling fdatasync is an added layer of paranoia, but it will incur a performance hit, so I omitted it from this example since it is not commonly practiced.

Community
  • 1
  • 1
lamont
  • 3,854
  • 1
  • 20
  • 26
  • Instead of opening a temp file in the directory you're in it will actually automatically create one in app data directory (on Windows anyways) and from their you can do a file.unlink to delete it.. – 13aal Apr 16 '16 at 22:19
  • 3
    I really appreciated the extra thought that was put into this. As a beginner, it is very interesting to see the thought patterns of experienced devs who can not just answer the original question, but also comment on the larger context of what the original question actually means. – ramijames Jun 02 '16 at 05:28
  • 1
    Programming isn't just about fixing the immediate problem, it's also about thinking way ahead to avoid other problems lying in wait. Nothing irritates a senior developer more than to encounter code that painted the algorithm into a corner, forcing an awkward kludge, when a minor adjustment earlier would have resulted in a nice flow. It can often take hours, or days, of analyzing to understand the goal, and then a few lines replace a page of old code. It's like a game of chess against the data and the system at times. – the Tin Man May 12 '17 at 17:54
12

There isn't really a way to edit files in-place. What you usually do when you can get away with it (i.e. if the files are not too big) is, you read the file into memory (File.read), perform your substitutions on the read string (String#gsub) and then write the changed string back to the file (File.open, File#write).

If the files are big enough for that to be unfeasible, what you need to do, is read the file in chunks (if the pattern you want to replace won't span multiple lines then one chunk usually means one line - you can use File.foreach to read a file line by line), and for each chunk perform the substitution on it and append it to a temporary file. When you're done iterating over the source file, you close it and use FileUtils.mv to overwrite it with the temporary file.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • 1
    I like the streaming approach. We deal with large files concurrently so we don't usually have the space in RAM to read the entire file – Shane Jul 26 '11 at 19:46
  • "[Why is “slurping” a file not a good practice?](https://stackoverflow.com/questions/25189262/why-is-slurping-a-file-not-a-good-practice)" might be useful reading in relation to this. – the Tin Man Dec 06 '19 at 20:08
9

Another approach is to use inplace editing inside Ruby (not from the command line):

#!/usr/bin/ruby

def inplace_edit(file, bak, &block)
    old_stdout = $stdout
    argf = ARGF.clone

    argf.argv.replace [file]
    argf.inplace_mode = bak
    argf.each_line do |line|
        yield line
    end
    argf.close

    $stdout = old_stdout
end

inplace_edit 'test.txt', '.bak' do |line|
    line = line.gsub(/search1/,"replace1")
    line = line.gsub(/search2/,"replace2")
    print line unless line.match(/something/)
end

If you don't want to create a backup then change '.bak' to ''.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
DavidGamba
  • 3,503
  • 2
  • 30
  • 46
  • 1
    This would be better than trying to slurp (`read`) the file. It's scalable and should be very fast. – the Tin Man Nov 26 '14 at 18:25
  • There is a bug somewhere causing Ruby 2.3.0p0 on Windows to fail with permission denied if there are several consecutive inplace_edit blocks working on same file. To reproduce split search1 and search2 tests into 2 blocks. Not closing completely? – mlt Oct 10 '16 at 21:56
  • I'd expect problems with multiple edits of a text file occurring simultaneously. If nothing else you could get a badly mangled text file. – the Tin Man May 12 '17 at 18:01
7

This works for me:

filename = "foo"
text = File.read(filename) 
content = text.gsub(/search_regexp/, "replacestring")
File.open(filename, "w") { |file| file << content }
mahemoff
  • 44,526
  • 36
  • 160
  • 222
Alain Beauvois
  • 5,896
  • 3
  • 44
  • 26
6

Here's a solution for find/replace in all files of a given directory. Basically I took the answer provided by sepp2k and expanded it.

# First set the files to search/replace in
files = Dir.glob("/PATH/*")

# Then set the variables for find/replace
@original_string_or_regex = /REGEX/
@replacement_string = "STRING"

files.each do |file_name|
  text = File.read(file_name)
  replace = text.gsub!(@original_string_or_regex, @replacement_string)
  File.open(file_name, "w") { |file| file.puts replace }
end
tanner
  • 149
  • 2
  • 9
3
require 'trollop'

opts = Trollop::options do
  opt :output, "Output file", :type => String
  opt :input, "Input file", :type => String
  opt :ss, "String to search", :type => String
  opt :rs, "String to replace", :type => String
end

text = File.read(opts.input)
text.gsub!(opts.ss, opts.rs)
File.open(opts.output, 'w') { |f| f.write(text) }
Ninad
  • 957
  • 7
  • 19
  • 2
    It helps more if you supply an explanation why this is the preferred solution and explain how it works. We want to educate, not just provide code. – the Tin Man Dec 06 '19 at 20:13
  • trollop was renamed optimist https://github.com/manageiq/optimist. Also it's just a CLI option parser not really required to answer the question. – noraj May 03 '20 at 17:33
1

Here an alternative to the one liner from jim, this time in a script

ARGV[0..-3].each{|f| File.write(f, File.read(f).gsub(ARGV[-2],ARGV[-1]))}

Save it in a script, eg replace.rb

You start in on the command line with

replace.rb *.txt <string_to_replace> <replacement>

*.txt can be replaced with another selection or with some filenames or paths

broken down so that I can explain what's happening but still executable

# ARGV is an array of the arguments passed to the script.
ARGV[0..-3].each do |f| # enumerate the arguments of this script from the first to the last (-1) minus 2
  File.write(f,  # open the argument (= filename) for writing
    File.read(f) # open the argument (= filename) for reading
    .gsub(ARGV[-2],ARGV[-1])) # and replace all occurances of the beforelast with the last argument (string)
end

EDIT: if you want to use a regular expression use this instead Obviously, this is only for handling relatively small text files, no Gigabyte monsters

ARGV[0..-3].each{|f| File.write(f, File.read(f).gsub(/#{ARGV[-2]}/,ARGV[-1]))}
peter
  • 41,770
  • 5
  • 64
  • 108
  • This code won't work. I'd suggest testing it before posting, then copy and paste the working code. – the Tin Man Dec 06 '19 at 20:15
  • @theTinMan I always test before publishing, if possible. I tested this and it works, both the short as the commented version. Why do you think it wouldn't ? – peter Dec 08 '19 at 19:44
  • if you mean using a regular expression see my edit, also tested :>) – peter Dec 08 '19 at 20:07
1

If you need to do substitutions across line boundaries, then using ruby -pi -e won't work because the p processes one line at a time. Instead, I recommend the following, although it could fail with a multi-GB file:

ruby -e "file='translation.ja.yml'; IO.write(file, (IO.read(file).gsub(/\s+'$/, %q('))))"

The is looking for white space (potentially including new lines) following by a quote, in which case it gets rid of the whitespace. The %q(')is just a fancy way of quoting the quote character.

Dan Kohn
  • 33,811
  • 9
  • 84
  • 100
0

I am using the tty-file gem

Apart from replacing, it includes append, prepend (on a given text/regex inside the file), diff, and others.

user9869932
  • 6,571
  • 3
  • 55
  • 49