13

I know how to write to a file, and read from a file, but I don't know how to modify a file besides reading the entire file into memory, manipulating it, and rewriting the entire file. For large files this isn't very productive.

I don't really know the difference between append and write.

E.g.

If I have a file containing:

Person1,will,23
Person2,Richard,32
Person3,Mike,44

How would I be able just to delete line containing Person2?

sites
  • 21,417
  • 17
  • 87
  • 146
Senjai
  • 1,811
  • 3
  • 20
  • 40
  • Sawa, you're always helping me out. So everytime a program saves a file, it overwrites the entire file? – Senjai May 19 '13 at 19:44
  • How do you plan to find which lines to remove without reading the file? It it always a certain line number? – Shawn Balestracci May 19 '13 at 19:48
  • @Senjai Sergio suggests something that might help, and if that is correct, then my previous comments are wrong. Sorry about that. – sawa May 19 '13 at 19:56
  • 2
    Do you want to delete Person2 or line containing Person2? – sites May 19 '13 at 19:56
  • The line containing Person2, I would use a regular expression to find the line. No guarentee it would be on the same line everytime. – Senjai May 19 '13 at 20:05
  • [This](http://stackoverflow.com/questions/508983) might be related, but it is in Python. – sawa May 19 '13 at 20:21

4 Answers4

15

You can delete a line in a several ways:

  • Simulate deletion. That is, just overwrite line's content with spaces. Later, when you read and process the file, just ignore such empty lines.

    Pros: this is easy and fast. Cons: it's not real deletion of data (file doesn't shrink) and you need to do more work when reading/processing the file.

    Code:

    f = File.new(filename, 'r+')
    f.each do |line|
      if should_be_deleted(line)
        # seek back to the beginning of the line.
        f.seek(-line.length, IO::SEEK_CUR)
    
        # overwrite line with spaces and add a newline char
        f.write(' ' * (line.length - 1))
        f.write("\n")
      end
    end
    f.close
    
    File.new(filename).each {|line| p line }
    
    # >> "Person1,will,23\n"
    # >> "                  \n"
    # >> "Person3,Mike,44\n"
    
  • Do real deletion. This means that line will no longer exist. So you will have to read next line and overwrite the current line with it. Then repeat this for all following lines until the end of file is reached. This seems to be error prone task (lines of different lengths, etc), so here's an error-free alternative: open temp file, write to it lines up to (but not including) the line you want to delete, skip the line you want to delete, write the rest to the temp file. Delete the original file and rename temporary one to use its name. Done.

    While this is technically a total rewrite of the file, it does differ from what you asked. The file doesn't need to be loaded fully to memory. You need only one line at a time. Ruby provides a method for this: IO#each_line.

    Pros: No assumptions. Lines get deleted. Reading code needs not to be altered. Cons: lots more work when deleting the line (not only the code, but also IO/CPU time).

    There is a snippet that illustrates this approach in @azgult's answer.

Community
  • 1
  • 1
Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • Is it possible to overwrite just a portion of a file (with spaces) without overwriting the entire file? – sawa May 19 '13 at 19:52
  • 1
    Sure, it's possible. Open a file in write mode, seek to needed offset and start writing. – Sergio Tulentsev May 19 '13 at 19:55
  • Not write mode, read/write mode (the 'r+' flag) is needed to overwrite parts. – azgult May 19 '13 at 19:56
  • Probably it helps to expand on that (showing the actual code) in the answer. – sawa May 19 '13 at 19:58
  • @azgult: Right, sorry. I meant: mode that enables you to write and doesn't truncate the file. Which is `r+` or `a+`. – Sergio Tulentsev May 19 '13 at 19:58
  • @sawa, what's the difference between `f.each_line do |line|` and `File.open('input.txt', 'r').each do |line|`? – sites May 19 '13 at 20:06
  • 1
    @juanpastas: no difference, these are aliases. – Sergio Tulentsev May 19 '13 at 20:10
  • Thank you Sergio, I appreciate this. So I would use w+, iterate through every line, rewriting each line unless I come to the line I don't want to rewrite? Is there a way of doing this without using a temporary file? – Senjai May 19 '13 at 20:12
  • @SergioTulentsev What I asked was a code for your first option (not overwriting the entire file, but moving the offset and rewriting only a portion of it). But your link is a code for the second option. juanpastas I don't think they are different. – sawa May 19 '13 at 20:13
  • I don't know how to do it. – sawa May 19 '13 at 20:15
  • From rubydocs, f = File.new("testfile") f.sysseek(-13, IO::SEEK_END) #=> Any position from eof f.syswrite("Hello") – Nerve May 19 '13 at 20:33
8

As files are saved essentially as a continuous block of data to the disk, removing any part of it necessitates rewriting at least what comes after it. This does in essence mean that - as you say - it isn't particularly efficient for large files. It is therefore generally a good idea to limit file sizes so that such problems don't occur.

A few "compromise" solutions might be to copy the file over line by line to a second file and then moving that to replace the first. This avoids loading the file into memory but does not avoid any hard disk access:

require 'fileutils'

open('file.txt', 'r') do |f|
  open('file.txt.tmp', 'w') do |f2|
    f.each_line do |line|
       f2.write(line) unless line.start_with? "Person2"
    end
  end
end
FileUtils.mv 'file.txt.tmp', 'file.txt'

Even more efficiently would be to read-write open the file and skip ahead to the position you want to delete and then shift the rest of the data back - but that would make for some quite ugly code (and I can't be asked to do that now).

azgult
  • 542
  • 3
  • 10
4

You could open the file and read it line by line, appending lines you want to keep to a new file. This allows you the most control over which lines are kept, without destroying the original file.

File.open('output_file_path', 'w') do |output| # 'w' for a new file, 'a' append to existing
  File.open('input_file_path', 'r') do |input|
    line = input.readline
    if keep_line(line) # logic here to determine if the line should be kept
      output.write(line)
    end
  end
end

If you know the position of the beginning and end of the chunk you want to remove, you can open the file, read to the start, then seek to the end and continue reading.

Look up parameters to the read method, and read about seeking here:

http://ruby-doc.org/core-2.0/IO.html#method-i-read

Matt
  • 13,948
  • 6
  • 44
  • 68
0

Read here:

File.open('output.txt', 'w') do |out_file|
  File.open('input.txt', 'r').each do |line|
    out_file.print line.sub('Person2', '')
  end
end
Community
  • 1
  • 1
sites
  • 21,417
  • 17
  • 87
  • 146