6

I have users who cannot or do not want to connect with relational databases, but instead prefer to work with data exported to excel files. The recordsets exported from these database data can become rather large. (I also export to CSV files).

My question is related to this one: Handling java.lang.OutOfMemoryError when writing to Excel from R.

As recommended in the accepted anser to this question (or rather the first comment), I now use the Rcpp-based openxlsx package to export some views from the database. It works when the export has ~67000 rows, but it does not work for larger datasets (~1 million rows, ~20 params, all numeric except a few datetimes).

openxlsx::write.xlsx(data, file = "data.2008-2016.xlsx") # 800000 rows

Error: zipping up workbook failed. Please make sure Rtools is installed or a zip application is available to R.
         Try installr::install.rtools() on Windows

(I'm using a Linux PC, and /usr/bin/zip is available to R)

Can I give the openxlsx package more memory? Or set some tuneable options to perform better with large datasets?

For openxlsx, is there something like the options(java.parameters = "-Xmx1000m") for the java-based xlsx package?

The openxlsx vignette does not mention any options. But maybe there are some undocumented ways or options? (e.g. showing a progress bar during saving)

At this point I proceed like this: close all unneeded apps, restart Rstudio, keep few/no large objects around in the global environment, query db, then run write.xlsx(). With a "clean slate" like this, it succeeded in exporting the 800000 row dataset to a 93MB-xlsx-file.

zx8754
  • 52,746
  • 12
  • 114
  • 209
knb
  • 9,138
  • 4
  • 58
  • 85
  • 2
    Excel has row limits... https://support.office.com/en-gb/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3 – cory Mar 15 '16 at 13:09
  • @cory, thanks for linking to the specifications. I was aware that limitations exist but I thought they were higher; in particular I thought that "Worksheet size" was also "Limited by available memory and system resources" but that's obviously a misconception. (Worksheet size limit is 1,048,576 rows by 16,384 columns...) – knb Mar 15 '16 at 13:59
  • 2
    Excel 2007 had a limit of 65,536 rows. Those days were rough... – cory Mar 15 '16 at 14:01
  • can you use `options(error=recover)` or `debug(openxlsx::write.xlsx)` to track down the precise problem? – Ben Bolker Dec 23 '16 at 16:28
  • You can also try Sys.setenv(R_ZIPCMD="/path_to_zip_tool") and see if making that more specific helps. Otherwise take a look at https://stat.ethz.ch/R-manual/R-devel/library/utils/html/zip.html. – Dylan Cross Jan 06 '17 at 15:48

1 Answers1

5

Your problem isn't the memory. openxlsx requires installing RTools or similar to save larger excel files.

I had the same problem and same error you're seeing just yesterday. Below is a link for the windows installer:

https://cran.r-project.org/bin/windows/Rtools/index.html

The following site further explains the requirements:

https://www.r-project.org/nosvn/pandoc/openxlsx.html

Dylan Cross
  • 544
  • 2
  • 6
  • 14
  • 1
    This is the correct answer based on your error message. You need a version of `zip` that is compatible with `openxlsx`. See also here. https://www.r-project.org/nosvn/pandoc/openxlsx.html – TARehman Dec 22 '16 at 15:01
  • I'm on Linux. However RTools is available for Windows only - I did not say so explicity, but implicitly when I wrote "`(/usr/bin/zip` is available to R)" - so this Answer does not apply to my use case. Upvoted though – knb Dec 23 '16 at 07:38
  • Thanks for the upvote nonetheless! Idea 1: Is there any chance it's the zip tool that's failing on memory? Odd that you get the "zipping up workbook failed" message. Idea 2: Given it works with a "clean slate", maybe, your machine doesn't have the available memory needed. If you watch your memory usage, does it increase and cap out when running your R script? – Dylan Cross Jan 05 '17 at 15:17