0

I have a file Index.csv that contains the following data:

100
200
300
400
500
600
700
800
900
1000

I need to print or save into a new file New.csv the rows of a CSV file Original.csv as described in Original.csv. How do I do that?

I could not do it, so I copied the contents of Index.csv into an array, and wrote the following code, but it's not working:

array = [100,200,300,400,500,600,700,800,900,1000]
CSV.open('New.csv', "wb") do |csv|
    f = File.open('Original.csv', "r")
        f.each_line { |line|
            row = line.split(",")
            for i in 0..array.size
                if array[i]==line
                    csv<<row
                end
            end
    }
end
sawa
  • 165,429
  • 45
  • 277
  • 381
Kristada673
  • 3,512
  • 6
  • 39
  • 93
  • Look at this question to find out how to read nth line from a file - http://stackoverflow.com/questions/4014352/ruby-getting-a-particular-line-from-a-file – Wand Maker Dec 15 '15 at 16:47
  • What does "the rows of a CSV file `Original.csv` as described in `Original.csv`" mean? – the Tin Man Dec 15 '15 at 17:55
  • How big can Index.csv and Original.csv become? Will Index.csv always be in an ascending order? – the Tin Man Dec 15 '15 at 17:57

2 Answers2

2

There is missing detail in your question, such as how many lines are in the files, and whether the index file is sorted. Without that information and assuming the worst, huge files and an unsorted index file, I'd use something like this code:

File.open('new.csv', 'w') do |new_csv|
  File.foreach('index.csv') do |line_num|
    File.open('original.csv', 'r') do |original_csv|
      original_line = ''
      line_num.to_i.times do
        original_line = original_csv.gets
      end
      new_csv.puts original_line
    end
  end
end

Assuming an index.csv of:

1
3
5
7
9

and an original.csv of:

row1
row2
row3
row4
row5
row6
row7
row8
row9
row10

Running the code creates new.csv:

> cat new.csv
row1
row3
row5
row7
row9

CSV files are text, so it's not necessary to use the CSV class to read or write them if we're only concerned with the individual lines.

There are changes that could be made to use readlines and slurping the input files and indexes into the resulting arrays, but that will result in code that isn't scalable. The suggested code will result in rereading original.csv for each line in index.csv, but it'll also handle files of arbitrary size, something that's very important in production environments.

For instance, if index.csv will be small and unsorted:

File.open('new.csv', 'w') do |new_csv|
  indexes = File.readlines('index.csv').map(&:to_i).sort
  File.foreach('original.csv').with_index(1) do |original_line, original_lineno|
    new_csv.puts original_line if indexes.include?(original_lineno)
  end
end

That will run more quickly because it only iterates through original.csv once, but opens up a potential scalability problem if index.csv grows too big.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
1

I will show you a way to print a line without reading from "Index.csv".

array = [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
i = array.shift
File.new("Original.csv").each_line.with_index(1) do
  |l, j|
  if j == i
    puts l
    i = array.shift
  end
end
sawa
  • 165,429
  • 45
  • 277
  • 381