2

I am trying to do something similar to this question: Parse a CSV file, update a field, then save

Essentially I have a CSV with each line having a comma separated list of IPs, each row represents a different IP group. I pass this CSV to a function that does some work on each IP per row. After that I want to append a status and timestamp to the line. The solution above requires the creation of a second file, is there a way to do this without creating the additional file, just appending to each row in place?

CSV.open('csv_with_ips.csv', 'r+').each do |row|
  <do some stuff>
  row << 'SCANNED'
  row << Time.now
end
Community
  • 1
  • 1

1 Answers1

4

You don't have to create a new file, but good practice says you should.

Updating a line with a longer line requires the OS to overwrite the following line and possible subsequent ones. If the code crashes before processing the entire file the content will be corrupted.

Also, doing what you want implies you have to read the entire file into memory, which isn't a good idea as your code won't scale well. If you don't read it into memory the line you write will stomp on the next line you want to read, corrupting it.

Safe processing of the data would go something like this:

File.open('output.csv', 'w') do |fo|
  File.foreach('input.csv') do |li|
    # modify the input line "li"
    fo.puts(li)
  end
end

File.mv('input.csv', 'input.csv.bak')
File.mv('output.csv', 'input.csv')

But, of course, you'd want to take advantage of the CSV class that comes with Ruby and let it handle the heavy lifting. CSV file format isn't as simple as people think and it's easy to mess up the data so take advantage of what the CSV class implements.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • If the contents of those input CSV files are small enough to comfortably fit in memory, he could also process and store each line in an array and then dump it all at once to a file of the same name. – Alexandre Angelim Feb 16 '17 at 22:50
  • While that seems like it'd work, it only works until a file is read that exceeds memory. In the environment I work in we regularly see files well into the 10s of GB, so slurping them will kill processing speed, whereas line-by-line IO will not slow down. See http://stackoverflow.com/q/25189262/128421 for some metrics on why it's not a good idea. Besides that, if the code crashes during the write the data file will still be corrupted. Preserving production data is extremely important. – the Tin Man Feb 16 '17 at 23:40
  • Agreed. Thanks for the reference link. – Alexandre Angelim Feb 16 '17 at 23:51
  • File.mv returned a no method error. Just did File.rename. Thanks, this worked. – Trevor Steen Feb 17 '17 at 14:53