0

I have an application that needs to build a tsv file from a data file that has a couple hundred million rows.

Right now my code to build the tsv file looks something like this:

File.open(data, 'rb').each { |line|
        row = to_tsv_row(line)) # this formats the row to be delimited by a tab
        open(tsv_path, "a+") { |f| f << row }
      }

This seems like it will be a rather slow way to build a tsv file (perhaps inefficient). Is there a library out there that could do this quickly and efficiently?

Jackson
  • 6,391
  • 6
  • 32
  • 43

1 Answers1

0

You can use Ruby's CSV library:

require 'csv'
CSV.open(tsv_path, 'w', col_sep: "\t") do |tsv|
  File.open(data, 'rb').each do |line|
    row = to_row(line) # this only needs to convert line to an array of Objects responding to to_s
    tsv << row
  end
end

This takes about 10 seconds to do a million rows on my system, though I think that depends a lot on your data and hardware. A TSV with a hundred million rows sounds like a bad idea in the first place.

Max
  • 21,123
  • 5
  • 49
  • 71