0

I am trying to import a 128M .xlsx file into R, and am having no end of trouble.

Here is the code I am using:

memory.limit(size = 32292)
setwd("C:\\Users\\Administrator\\Desktop\\R Programming")
getwd()
install.packages("rJava")
library(rJava)
options(java.parameters = "-Xmx4g" )
install.packages("xlsx")
library(xlsx)
datafile <- read.xlsx("big honking datafile.xlsx", 1)

And here are the results:

> memory.limit(size = 32292)
[1] 32292
> setwd("C:\\Users\\Administrator\\Desktop\\R Programming")
> getwd()
[1] "C:/Users/Administrator/Desktop/R Programming"
> install.packages("rJava")
trying URL 'http://mran.revolutionanalytics.com/snapshot/2014-10  01/bin/windows/contrib/3.1/rJava_0.9-6.zip'
Content type 'application/zip' length 759547 bytes (741 Kb)
opened URL
downloaded 741 Kb

package ‘rJava’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Administrator\AppData\Local\Temp\2\Rtmpuacxly\downloaded_packages
> library(rJava)
> #options(java.parameters = "-Xmx1000m")
> options(java.parameters = "-Xmx4g" )
> install.packages("xlsx")
trying URL 'http://mran.revolutionanalytics.com/snapshot/2014-10 01/bin/windows/contrib/3.1/xlsx_0.5.7.zip'
Content type 'application/zip' length 400740 bytes (391 Kb)
opened URL
downloaded 391 Kb

package ‘xlsx’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Administrator\AppData\Local\Temp\2\Rtmpuacxly\downloaded_packages
> library(xlsx)
Loading required package: xlsxjars
> datafile <- read.xlsx("big honking datafile.xlsx", 1) # read first sheet
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.lang.OutOfMemoryError: Java heap space

I know this is a large file, but I've used larger .csv files before. In fact, when this file is externally converted to .csv, I have no issues.

Anyone have any suggestions (aside from always working in .csv, and never having to convert)?

Thanks!

rucker
  • 393
  • 3
  • 13
  • 1
    There appear to be several similar questions, did you try the suggestions found in http://stackoverflow.com/questions/12625481/memoryerror-in-r-while-read-xlsx or http://stackoverflow.com/questions/21937640/handling-java-lang-outofmemoryerror-when-writing-to-excel-from-r ? – Ista Jan 17 '15 at 02:37
  • 2
    Use `openxlsx` package. It doesn't depend on Java and you never have to worry about errors like `java.lang.OutOfMemoryError: Java heap space`. Only downside is it doesn't handle `xls` files. – Khashaa Jan 17 '15 at 02:48
  • 1
    From my experience, reading Excel files with `xlsx` or `XLConnect` uses much more memory than reading a `csv` file with similar size. With about 2 GB of available memory, I could not read an Excel file of about 100 MB. – Stibu Jan 17 '15 at 07:34
  • 1
    This is caused by the fact that XLConnect needs to copy your entire data object over to the JVM in order to write it to a file and the JVM has to be initialized with a fixed upper limit on its memory size. To change this amount, you can pass parameters to the R's JVM just like you can to a command line Java process via rJava's options support: options(java.parameters = "-Xmx1024m") library(XLConnect) – lawyeR Jan 17 '15 at 11:06
  • Thanks, everyone! Appreciate the suggestions. These give me several different paths to try. Sorry about the repeat question, although to be fair, the linked answer didn't come up when I searched. Perhaps I was using different terms. Thanks again! – rucker Jan 17 '15 at 12:34

0 Answers0