10

I used the spreadsheet gem to do this. It works but it can be very slow at times .I even tried the Roo gem, but that didn't improve the performance. Is there a better way to do this job? The weird thing is that some worksheets in the same excel work faster and some worksheets work very slowly, even taking up to 1 hour.

Can we use open office to open each worksheet(tab) in a single excel and convert them to csv much faster? If yes, how would I do it in ruby?

Or is there an even better solution?

Just adding a small example I tried with Roo gem

xls = Roo::Excel.new(source_excel_file)
xls.each_with_pagename do |name, sheet|
  # p sheet.to_csv(File.join(dest_csv_dir,name + ".csv"))
  #sheet.parse(:clean => true)#.to_csv(File.join(dest_csv_dir,name + ".csv"))
  puts name
  puts sheet.parse(:clean => true)
end
pnuts
  • 58,317
  • 11
  • 87
  • 139
Arunachalam
  • 5,417
  • 20
  • 52
  • 80

4 Answers4

5

Cowardly Preface: I am SUPER new to ruby and know almost nothing of rails, but I have tangled with Excel before. I created a dummy workbook on my local machine with 5 sheets, each containing 10 columns and 1000 rows of randomly-generated numbers. I converted each sheet into its own CSV with this:

require 'win32ole'
require 'csv'

# configure a workbook, turn off excel alarms
xl = WIN32OLE.new('excel.application')
book = xl.workbooks.open('C:\stack\my_workbook.xlsx')
xl.displayalerts = false

# loop through all worksheets in the excel file
book.worksheets.each do |sheet|
  last_row = sheet.cells.find(what: '*', searchorder: 1, searchdirection: 2).row
  last_col = sheet.cells.find(what: '*', searchorder: 2, searchdirection: 2).column
  export = File.new('C:\\stack\\' + sheet.name + '.csv', 'w+')
  csv_row = []

  # loop through each column in each row and write to CSV
  (1..last_row).each do |xlrow|
    (1..last_col).each do |xlcol|
      csv_row << sheet.cells(xlrow, xlcol).value
    end
    export << CSV.generate_line(csv_row)
    csv_row = []
  end
end

# clean up
book.close(savechanges: 'false')
xl.displayalerts = true
xl.quit

An eyeball benchmark for this script was ~30 seconds, with each attempt coming in a few seconds above or below that.

Dan Wagner
  • 2,693
  • 2
  • 13
  • 18
  • but I assume it is for a windows machine .I guess I should have mentioned it but I think it won't work in mac/unix/linux right ? – Arunachalam May 20 '14 at 20:17
  • This one is a good solution for windows users but unfortunately I never use windows machine .I use a Mac and my code run on linux machines . – Arunachalam May 20 '14 at 21:02
  • super stupid question but is it not possible to get win32ole on *nix? – Dan May 23 '14 at 22:00
  • @DanPantry, very unlikely. under the hoods, win32ole is like "use installed microsoft office to do the work". it's very handy when you are on Windows+Office (all the examples in this comment are straightforward translation from Office VBA Help); but in any other environment you just can't get it. – zverok May 30 '14 at 22:54
2

I assume we are talking about old Excel format (xls), it seems spreadsheet gem can't work with xlsx anyways.

I'd give a try to one of command-line spreadhseet converters: either xls2csv from catdoc package (really fast, though not all Excel files are processed successfully) or ssconvert from gnumeric package (moderate speed, and requires to install entrie GNumeric, which sometimes not an option for server, but really robust).

NB: When parsing Excel, roo just requires spreadsheet and wraps it in own API, so it never can be faster or more reliable than spreadsheet.

NB2: If I remember correctly (thought it was some long years ago), trying to automate OpenOffice from ruby was a) really hard and b) really slow.

zverok
  • 1,290
  • 1
  • 9
  • 13
  • Only problem I have with xls2csv is what If I have multiple worksheets in an single excel and want them to be separate CSV . – Arunachalam May 29 '14 at 18:59
  • xls2csv has -b option - "sheet break string". So, if your xls is succesfully parsed with xls2csv (mine are not, so I can't test it), you can either split output file with some more bash magic, like [here](http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter); or you can capture output into Ruby with `popen`, and just use regular `split` method on it. – zverok May 29 '14 at 22:48
  • Mine neither are getting parsed :( . – Arunachalam May 30 '14 at 18:16
  • give a try to `ssconvert`, then (though, if on server, it provides no option besides install the full gnumeric, which, in its turn, require all gnome libraries) – zverok May 30 '14 at 22:55
2
xls_file = Roo::Excelx.new('test.xlsx')
CSV.open('test.csv') do |csv|
    (2..xls_file.last_row).each do |i| # if you do not need header otherwise (1..xls_file.last_row)
        csv << a.row(i)
    end
end
Shanky Munjal
  • 671
  • 5
  • 18
  • 1
    way too low :D. There is this also as an alternative https://github.com/scpike/excel2csv/blob/master/src/excel2csv.rb But mostly what shanky Munjal said – Lucian Tarna Jul 12 '18 at 15:20
1

Be sure you're using the up-to-date Roo (1.13.2).

Also be sure you're using the patch for skipping trailing blank rows:

https://github.com/Empact/roo/blob/master/lib/roo/worksheet.rb

If you can post one of your spreadsheets that's taking a long time to parse, it may help people here help you. Just be sure to delete any confidential data.

joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
  • yes I m using the latest gem . I tried this example with roo gem but it was very slow . – Arunachalam May 27 '14 at 16:26
  • xls = Roo::Excel.new(source_excel_file) xls.each_with_pagename do |name, sheet| # p sheet.to_csv(File.join(dest_csv_dir,name + ".csv")) #sheet.parse(:clean => true)#.to_csv(File.join(dest_csv_dir,name + ".csv")) puts name puts sheet.parse(:clean => true) end – Arunachalam May 27 '14 at 16:26
  • can you give me an example of how to use Roo for multiple sheet in single excel . – Arunachalam May 28 '14 at 16:18