1

I have a long CSV file with two columns of numbers:

1,2
2,5
7,3
etc...

I would like to add a third column equal to the sum of the first two:

1,2,3
2,5,7
7,3,10

The following code is a solution to the problem, and it makes a copy of the input file, with the third column appended. Instead, I would like to operate on the input file line by line, writing the third column to each line as I went along. If the process gave error through for some reason, the answers to the first half of the file should already be saved and would not need to be recalculated.

I can't come up with a good way to do this using ruby's CSV class. Here's my current solution with the copied file:

require 'csv'

CSV.open("big_file.csv", "w") do |csv|
  csv << %w{1 2}
  csv << %w{2 5}
  csv << %w{3 8}
end

big_csv_file = CSV.open("big_file.csv", 'r')


# I'm creating a copy of big_file.csv here
# I'd rather edit it in place

CSV.open("copy_with_extra_column.csv", "w") do |csv|
  big_csv_file.each do |row|
    row << eval(row[0] + row[1])
    csv << row
  end
end
ekremkaraca
  • 1,453
  • 2
  • 18
  • 37
Jonah
  • 15,806
  • 22
  • 87
  • 161
  • How about writing each answer to a second file, then you could manage memory carefully and any errors would be ignored and your program can continue... if that's what you're after. – Rots May 31 '14 at 10:45
  • Also the `eval` seems a bit hacky, just do `sum = row.map(&:to_i).reduce(:+)` – Daniël Knippers May 31 '14 at 10:57
  • possible duplicate of [Read, edit, and write a text file line-wise using Ruby](http://stackoverflow.com/questions/4397412/read-edit-and-write-a-text-file-line-wise-using-ruby) – Jonah Jun 01 '14 at 17:19

3 Answers3

1

A file is like one long string, for example:

1,2\n2,5

However, unlike a string, you can only overwrite characters in a file. In the example above, there are 7 characters. You can overwrite any of those characters with any characters you choose. So for instance, if you put the sum of the numbers at position 0 and position 2 into position 3, the result is:

1,232,5

That's probably not what you want because it looks like the first two numbers are 1 and 232 and their sum is 5. However, that is all you can do when editing a file inplace: you can only overwrite characters with other characters.

For a large file, you can read in one line, then write the altered line to a new file. When you are done, you can delete the original file, and then you can rename the new file to the old file name. You can use the Tempfile class to avoid name clashes for the new file name.

7stud
  • 46,922
  • 14
  • 101
  • 127
1

To put this another, way, there is no way, at the fundamental file level, to "insert" the sum into the file. In your example:

1,2
2,5
7,2

If we ignore the whole notion of a "CSV" file (which is really just a concept layered on top of a stream text file) To "insert" the text ,3 at the end of the first line, we need to do all of these things:

  1. move the "\n" after the 2, and all the following text two positions later in the file (leaving some junk in its place)
  2. overwrite the junk with ",3"

Then you would repeat this process for each additional row.

This is obviously very inefficient. In simple terms, the CSV file format is not designed for efficient insertion of data.

Your two options are:

  1. Load the file into memory (as, i.e., an array of lines), operate on it there, and then write it all back out over the existing file. Assuming your file only grows, this will work fine, but you'll need to be willing to allocate enough memory to read and operate on the whole file.
  2. Write to a temporary file as you work through the data, and then move the temporary file in place of the original when you're done.

Updating the file "in place" is not practical.

gwcoffey
  • 5,551
  • 1
  • 18
  • 20
0

Instead of CSV.open(), try CSV.read(). For example, it's obviously a little ugly, but:

big_csv_file = CSV.read("big_file.csv")

big_csv_file[0] << eval(big_csv_file[0][0] + big_csv_file[0][1])

CSV.open("copy_with_extra_column.csv", "w") do |csv|
  big_csv_file.each do |row|
    csv << row
  end
end

If you need the file to always be at the latest, the alterations and the writing will need to be in a loop, obviously.

John C
  • 1,931
  • 1
  • 22
  • 34
  • Reading the entire big file into memory is probably not a desired solution, assuming it can be multiple GBs. – Daniël Knippers May 31 '14 at 10:57
  • You should edit your question to say so @DaniëlKnippers. – RubberDuck May 31 '14 at 11:14
  • 2
    I believe OP is asking if the original file can be edited in place, line by line, essentially reading and writing at the same time. Not sure this is possible. http://stackoverflow.com/questions/4397412/read-edit-and-write-a-text-file-line-wise-using-ruby – fletcher May 31 '14 at 11:17
  • @DaniëlKnippers, I agree that it's problematic, but since `CSV` only has a couple of methods, that's pretty much it. If you don't want either, you'll need to code the file management manually. – John C May 31 '14 at 11:33