59

I tried to load a ~30MB excel spreadsheet into R using the XLConnect package.

This is what I wrote:

wb <- loadWorkbook("largespreadsheet.xlsx")

And after about 15 seconds, I got the following error:

Error: OutOfMemoryError (Java): GC overhead limit exceeded.

Is this a limitation of the XLConnect package or is there a way to tweak my memory settings to allow for larger files?

I appreciate any solutions/tips/advice.

AME
  • 5,234
  • 23
  • 71
  • 81
  • 1
    Have you tried other packages? What happened with them? By others, I mean `xlsx` and `RExcel`. Take a look at [this vignette](http://cran.r-project.org/web/packages/xlsx/vignettes/xlsx.pdf) for other resources. – Iterator Nov 01 '11 at 22:50
  • 1
    Separately, perhaps you can verify that the issue is most likely due to the spreadsheet? For instance, create a small spreadsheet and test that it works. If that works, I'd then take increasing rectangles from the spreadsheet, copy & paste, and see if a breaking point can be found. Perhaps there's something weird in the spreadsheet. – Iterator Nov 01 '11 at 22:53
  • Good point, the XLConnect package does seem to work with smaller spreadsheets. However, I'm interested in finding a solution to this issue without changing the spreadsheet every time. – AME Nov 09 '11 at 01:25
  • 2
    Hypothetically speaking, it could be the case that the spreadsheet has problems that cause parsing issues for XLConnect. In that case, a change to XLConnect's memory allocation wouldn't help. If you can change to a different file type, e.g. XLS or CSV, that may help. – Iterator Nov 09 '11 at 02:04
  • 1
    Also, did you try the increasing rectangles method? You can automate the saving of the worksheet via either a COM connection or perhaps a script in Excel. I'd make sure there isn't something funky like brackets/braces, peculiar text (e.g. odd quotes, `<` or `>`), or something else causing problems. Of course the package *should* be able to load the data, but parsing odd formatting is always a bugaboo. – Iterator Nov 09 '11 at 02:38
  • Maybe you can upload an (anonymised) version of the spreadsheet somewhere for us to give it a try? And did you try the xlsx package already? – ROLO Nov 11 '11 at 11:30
  • update: using the RODBC package with .xls seems to work. The issue is I have to save the file as a .xlx rather than a .xlxs for this solution to work. Why can RODBC read the file when xlxs and xlconnect cannot? – AME Nov 16 '11 at 03:12
  • Use `readxl` package instead of `XLConnect` to read xls/xlsx files. :) – Maciej Apr 17 '15 at 19:57

7 Answers7

36

Follow the advice from their website:

options(java.parameters = "-Xmx1024m")
library(XLConnect)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
Henrico
  • 2,539
  • 1
  • 19
  • 10
  • 13
    Even after setting these options, I run into this error - Error: OutOfMemoryError (Java): Java Heap Space. – AME Nov 01 '11 at 16:17
  • 6
    Ive had the out of memory problem several times now as Ubuntu updates Java or the package rJava is updated. Your options() solution has worked for me when configured for more memory, as has the unlim -c unlimited command in the shell. But Im done chasing this problem. Ive migrated to openxlsx, which drops any dependence on Java and relies instead on C++. Ive never looked back. – Brad Horn Sep 08 '15 at 06:57
  • @BradHorn I encountered this problem and tried different solutions. I was sure it was a java problem because my file was very small, and none of the java based packages solved the problem, till I saw your comment and tried openxlsx. openxlsx is the easiest solution to try to see if it is a data related problem or java related problem. I think you should put your comment as an answer because people might not notice a comment. – Joswin K J Dec 16 '15 at 12:54
34

If you still have problems with importing XLSX files you can use this opiton. Anwser with "Xmx1024m" didn't work and i changed to "-Xmx4g".

options(java.parameters = "-Xmx4g" )
library(XLConnect)

This link was useful.

Maciej
  • 3,255
  • 1
  • 28
  • 43
  • 1
    This is the first solution I found that works for me. Evidently it requires having at least 4 free gigs of RAM: http://www.bramschoenmakers.nl/en/node/726 – zkurtz Jul 25 '13 at 14:02
  • 1
    Didn't work at the end for me - I got an error stating that the Garbage Collector overhead was too large. I got Excel to export to CSV and then used the native read.csv with sep=";" – Jochen van Wylick Jan 28 '15 at 13:33
  • Even after running these commands I still have the same error. I'm running r 3.1.1 on x86_64-apple-darwin13.1.0. – kilojoules Apr 17 '15 at 18:43
  • If you still run into the error after setting this, you should considering using a different format. – Joris Meys Apr 12 '16 at 08:04
16

Use read.xlsx() in the openxlsx package. It has no dependency on rJava thus only has the memory limitations of R itself. I have not explored in much depth for writing and formatting XLSX but it has some promising looking vignettes. For reading large spreadsheets, it works well.

Hat tip to @Brad-Horn. I've just turned his comment as an answer because I also found this to be the best solution!

vpipkt
  • 1,710
  • 14
  • 17
  • Just to note: this package does not work well on Windows. It depends on Perl instead of Java and so one has to bother with installing that. – CoderGuy123 Nov 14 '16 at 04:38
  • 3
    Indeed the openxlsx package is good. However, it doesn't support opening password protected workbooks. – avriis Sep 21 '17 at 11:01
  • 1
    Is there a package that does that? How would it get the password? Is the password stored in a script, environment variable? Is there a prompt for it? – vpipkt Sep 21 '17 at 12:57
  • Contrary to xlsx package this one (openxlsx) works and works much faster. Required no change in code in my case. – cineS. Feb 22 '22 at 14:31
16

In case someone encounters this error when reading not one huge but many files, I managed to solve this error by freeing Java Virtual Machine memory with xlcFreeMemory(), thus:

files <- list.files(path, pattern = "*.xlsx")
for (i in seq_along(files)) {
    wb <- loadWorkbook(...)
    ...
    rm(wb)
    xlcFreeMemory()  # <= free Java Virtual Machine memory !
}
  • 2
    This worked for me! I have tried setting `java.parameters` but it didn't work. – syd Aug 16 '17 at 21:35
  • 1
    Great find. My issue was with writing and re-writing the same file. (400K) xlsx file when all finished. – Dan Nov 02 '17 at 02:06
  • 1
    Also, you can use this to see the available memory using: xlcMemoryReport() – aliawadh980 Jan 17 '18 at 09:24
  • Excellent, I wish more people will upvote this answer, to find it at the top of the page. It is the only answer that solves the problem. – Adrian Jul 20 '22 at 16:08
4

Whenever you are using a library that relies on rJava (such as RWeka in my case), you are bound to hit the default heap space (512 MB) some day. Now, when you are using Java, we all know the JVM argument to use (-Xmx2048m if you want 2 gigabytes of RAM). Here it's just a matter of how to specify it in the R environnement.

   options(java.parameters = "-Xmx2048m")
   library(rJava)
Rahul Pandey
  • 605
  • 1
  • 5
  • 17
  • When I try this > options(java.parameters = "-Xmx20g") > library(rJava) > xlcMemoryReport() Amount of free memory in the Java Virtual Machine (JVM): 451.2915 MB – Michael Tuchman Mar 10 '20 at 20:59
4

As suggested in this here, make sure to run the option function in the first line in your code. In my case, it worked only when I restarted the R session and run it in the first line.

options(java.parameters = "-Xmx4g" )
library(XLConnect)
3

This appears to be the case, when u keep using the same R-session over and over again without restarting R-Studio. Restarting R-Studio can help to allocate a fresh memory-heap to the program. It worked for me right away.

Lycone
  • 650
  • 9
  • 18