How to print rows from a CSV file of index taken from another CSV file

Question

I have a file Index.csv that contains the following data:

I need to print or save into a new file New.csv the rows of a CSV file Original.csv as described in Original.csv. How do I do that?

I could not do it, so I copied the contents of Index.csv into an array, and wrote the following code, but it's not working:

array = [100,200,300,400,500,600,700,800,900,1000]
CSV.open('New.csv', "wb") do |csv|
    f = File.open('Original.csv', "r")
        f.each_line { |line|
            row = line.split(",")
            for i in 0..array.size
                if array[i]==line
                    csv<<row
                end
            end
    }
end

Look at this question to find out how to read nth line from a file - http://stackoverflow.com/questions/4014352/ruby-getting-a-particular-line-from-a-file — Wand Maker, Dec 15 '15 at 16:47
What does "the rows of a CSV file `Original.csv` as described in `Original.csv`" mean? — the Tin Man, Dec 15 '15 at 17:55
How big can Index.csv and Original.csv become? Will Index.csv always be in an ascending order? — the Tin Man, Dec 15 '15 at 17:57

the Tin Man · Answer 1 · 2015-12-15T18:38:46.040

There is missing detail in your question, such as how many lines are in the files, and whether the index file is sorted. Without that information and assuming the worst, huge files and an unsorted index file, I'd use something like this code:

File.open('new.csv', 'w') do |new_csv|
  File.foreach('index.csv') do |line_num|
    File.open('original.csv', 'r') do |original_csv|
      original_line = ''
      line_num.to_i.times do
        original_line = original_csv.gets
      end
      new_csv.puts original_line
    end
  end
end

Assuming an index.csv of:

and an original.csv of:

row1
row2
row3
row4
row5
row6
row7
row8
row9
row10

Running the code creates new.csv:

> cat new.csv
row1
row3
row5
row7
row9

CSV files are text, so it's not necessary to use the CSV class to read or write them if we're only concerned with the individual lines.

There are changes that could be made to use readlines and slurping the input files and indexes into the resulting arrays, but that will result in code that isn't scalable. The suggested code will result in rereading original.csv for each line in index.csv, but it'll also handle files of arbitrary size, something that's very important in production environments.

For instance, if index.csv will be small and unsorted:

File.open('new.csv', 'w') do |new_csv|
  indexes = File.readlines('index.csv').map(&:to_i).sort
  File.foreach('original.csv').with_index(1) do |original_line, original_lineno|
    new_csv.puts original_line if indexes.include?(original_lineno)
  end
end

That will run more quickly because it only iterates through original.csv once, but opens up a potential scalability problem if index.csv grows too big.

score 1 · Accepted Answer · answered Dec 15 '15 at 17:10

1

I will show you a way to print a line without reading from "Index.csv".

array = [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
i = array.shift
File.new("Original.csv").each_line.with_index(1) do
  |l, j|
  if j == i
    puts l
    i = array.shift
  end
end

answered Dec 15 '15 at 17:10

sawa

165,429
45
277
381

I ran your code. It's not displaying anything on the screen. – Kristada673 Dec 15 '15 at 17:15
1

@user5346597 Works just fine when I tried it. Does your file have 100 or more lines? – Wand Maker Dec 15 '15 at 17:18

How to print rows from a CSV file of index taken from another CSV file

2 Answers2