There is missing detail in your question, such as how many lines are in the files, and whether the index file is sorted. Without that information and assuming the worst, huge files and an unsorted index file, I'd use something like this code:
File.open('new.csv', 'w') do |new_csv|
File.foreach('index.csv') do |line_num|
File.open('original.csv', 'r') do |original_csv|
original_line = ''
line_num.to_i.times do
original_line = original_csv.gets
end
new_csv.puts original_line
end
end
end
Assuming an index.csv of:
1
3
5
7
9
and an original.csv of:
row1
row2
row3
row4
row5
row6
row7
row8
row9
row10
Running the code creates new.csv:
> cat new.csv
row1
row3
row5
row7
row9
CSV files are text, so it's not necessary to use the CSV class to read or write them if we're only concerned with the individual lines.
There are changes that could be made to use readlines
and slurping the input files and indexes into the resulting arrays, but that will result in code that isn't scalable. The suggested code will result in rereading original.csv for each line in index.csv, but it'll also handle files of arbitrary size, something that's very important in production environments.
For instance, if index.csv will be small and unsorted:
File.open('new.csv', 'w') do |new_csv|
indexes = File.readlines('index.csv').map(&:to_i).sort
File.foreach('original.csv').with_index(1) do |original_line, original_lineno|
new_csv.puts original_line if indexes.include?(original_lineno)
end
end
That will run more quickly because it only iterates through original.csv once, but opens up a potential scalability problem if index.csv grows too big.