100

Please can someone help me on the best way to import an excel 2007 (.xlsx) file into R. I have tried several methods and none seems to work. I have upgraded to 2.13.1, windows XP, xlsx 0.3.0, I don't know why the error keeps coming up. I tried:

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx","DNA_Tag_Numbers")

OR

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx",1)

but I get the error:

 Error in .jnew("java/io/FileInputStream", file) : 
  java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified)

Thank you.

joran
  • 169,992
  • 32
  • 429
  • 468
nolyugo
  • 1,451
  • 3
  • 12
  • 12
  • 4
    How many times are you doing this? If just once, it's almost always best to use Excel to export to a more open format, like `.csv`. – Ari B. Friedman Aug 13 '11 at 08:09
  • gsk3: Yes, or Google Docs, which has had Excel 2007 import support since '09: http://www.google.com/support/forum/p/Google%20Docs/thread?tid=143b9e598d8de9b5&hl=en (Hopefully by now it works well.) – Benjamin Atkin Aug 13 '11 at 08:15
  • 6
    What version of R is this? What operating system? What version of read.xlsx? Have you read the posting guide? – Spacedman Aug 13 '11 at 10:00
  • 2
    Can you explain how this question is different from http://stackoverflow.com/questions/6099243/read-an-excel-file-directly-from-a-r-script/6099497#6099497 – Chase Aug 13 '11 at 13:36
  • I have upgraded to 2.13.1, windows XP, xlsx 0.3.0, I don't know why the error keeps coming up..... Error in .jnew("java/io/FileInputStream", file) : java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified) – nolyugo Aug 13 '11 at 18:52
  • @Chase I guess the main difference is that the OP wants to read .xlsx files, the new format used by MS Excel. – Gavin Simpson Aug 13 '11 at 20:18
  • 9
    You have checked that R is actually able to find the file, e.g. `file.exists("C:/AB_DNA_Tag_Numbers.xlsx")` ? – Ben Bolker Aug 14 '11 at 23:05

16 Answers16

117

For a solution that is free of fiddly external dependencies*, there is now readxl:

The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies so it's easy to install and use on all operating systems. It is designed to work with tabular data stored in a single sheet.

Readxl supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library.

It can be installed like so:

install.packages("readxl") # CRAN version

or

devtools::install_github("hadley/readxl") # development version

Usage

library(readxl)

# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")

# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)

# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")

* not strictly true, it requires the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. But they don't require any fiddling with paths, etc., so that's an advantage over Java and Perl dependencies.

Update There is now the rexcel package. This promises to get Excel formatting, functions and many other kinds of information from the Excel file and into R.

Ben
  • 41,615
  • 18
  • 132
  • 227
  • Thanks! I was getting this error continuosly i.e. 'could not find function read_excel' & loading the file in this way fixed my issue. – Sanushi Salgado Jun 26 '21 at 14:03
35

You may also want to try the XLConnect package. I've had better luck with it than xlsx (plus it can read .xls files too).

library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)

also, if you are having trouble with your file not being found, try selecting it with file.choose().

Ian Fellows
  • 17,228
  • 10
  • 49
  • 63
  • 7
    Rather than `readWorksheet(loadWorkbook(...))` you can do the same a little more succinctly with `readWorksheetFromFile(...)`. – Hugh Jun 04 '14 at 03:16
  • This worked on one of my troublesome sheets that read.xlsx crashed on with "java.lang.IllegalArgumentException: Cell index must be >= 0" – ski_squaw Mar 03 '15 at 23:02
23

I would definitely try the read.xls function in the gdata package, which is considerably more mature than the xlsx package. It may require Perl ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
21

Update

As the Answer below is now somewhat outdated, I'd just draw attention to the readxl package. If the Excel sheet is well formatted/lain out then I would now use readxl to read from the workbook. If sheets are poorly formatted/lain out then I would still export to CSV and then handle the problems in R either via read.csv() or plain old readLines().

Original

My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".

CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().

These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.

Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 6
    The csv would not help since the workbook contains several worksheets and each has upto 100k rows. Thanks – nolyugo Aug 13 '11 at 08:27
  • 7
    I did say save the individual sheets as CSV files - as these are plain text, the size of the worksheets is irrelevant. If you insist on working with Excel workbooks then the available options for reading the data in to R become far more complex - see packages RODBC, RDCOM for alternatives. Finally, if you are certain you have followed the instructions for using `read.xlsx()` **and** have the latest versions of that package and R, then email the package maintainer to report a potential bug. – Gavin Simpson Aug 13 '11 at 09:17
  • It really depends on the data and the level of interoperability that your require between os. As soon as there is non-ascii in your labels or factors AND you need to work accross Mac/Win/Lin, assorted weirdness will start to occur if you export/import to and from excel. Excel cannot handle utf-8 gracefully (or csv for that matter). So in that case either you stay in Excel, or move to a different spreadsheet application (the latter is not always possible). – FvD Mar 10 '14 at 18:13
20

Example 2012:

library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
  • I would try 'xlsx' package for it is easy to handle and seems mature enough
  • worked fine for me and did not need any additionals like Perl or whatever

Example 2015:

library("readxl")
FirstTable  <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
  • nowadays I use readxl and have made good experience with it.
  • no extra stuff needed
  • good performance
petermeissner
  • 12,234
  • 5
  • 63
  • 63
15

This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf It doesn't require rJava and is using 'Rcpp' for speed.

RockScience
  • 17,932
  • 26
  • 89
  • 125
5

If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.

Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    when I load version 2.8.2 of `gdata` I get the startup message `gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.` Sounds to me like it should open xlsx files, although admittedly I haven't tested it recently ... – Ben Bolker Apr 09 '12 at 22:43
  • 1
    I can confirm that `gdata` version 2.8.2 reads `xlsx` files with the `read.xls` function. – Ben May 05 '12 at 05:44
5

For me the openxlx package worked in the easiest way.

install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");
Henrik Bata
  • 51
  • 1
  • 2
4

I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.

https://gist.github.com/schaunwheeler/5825002

The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.

This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when

  • Importing multiple excel files
  • Importing large files
  • Files that are updated regularly

Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.

Big props to Schaun for this solution.

Jonno Bourne
  • 1,931
  • 1
  • 22
  • 45
2

What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?

I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.

Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.

2

You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05

Above comment should've solved your problem:

require("xlsx")
read.xlsx("filepath/filename.xlsx",1) 

should work fine after that.

2

I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!

Try codes below:

    library(rio)
    AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
    AB <- AB[,1]

Hope this help. For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html

Suyang Xu
  • 21
  • 1
  • 3
1

As stated by many here, I am writing the same thing but with an additional point!

At first we need to make sure that our R Studio has these two packages installed:

  1. "readxl"
  2. "XLConnect"

In order to load a package in R you can use the below function:

install.packages("readxl/XLConnect")
library(XLConnect)
search()

search will display the list of current packages being available in your R Studio.

Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"

To solve this issue you can simply resave your excel sheet "xlsx" in to

"CSV (Comma delimited)"

and your life will be super easy....

Have fun!!

Rajeev Barnwal
  • 1,349
  • 11
  • 14
1

You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.

Benjamin Atkin
  • 14,071
  • 7
  • 61
  • 60
0

The installation of xlsx package require rJava and xlsxjars. Indirectly they require the specific (32 or 64 bit) java runtime environment on the system.

Pro of read.xlsx: In the same package there are read.xlsx and write.xlsx

Con: Very low speed

As suggested, the easy way is to save in .csv format from excel.

Simple benchmark on a 5800x15 dataset (median)

  1. read.xlsx: >10000ms
  2. read_xlsx: 70ms
  3. read.csv: 15ms
Alessio
  • 93
  • 1
  • 7
0

You can read a xlsx or xls file with the following approach. This approach only works on Windows.

library(RDCOMClient)

path_Excel_File <- "D:/empty_Excel_File.xls"

xlApp <- COMCreate("Excel.Application")
xlApp[["Visible"]] <- TRUE
xlWbk <- xlApp$Workbooks()$Open(path_Excel_File)
Sheets <- xlWbk$Sheets()

Mat_Val <- matrix(NA, nrow = 2, ncol = 2)

for(i in 1 : 2)
{
  for(j in 1 : 2)
  {
    Mat_Val[i, j] <- Sheets[[1]]$Cells(i, j)$Value()
  }
}
Emmanuel Hamel
  • 1,769
  • 7
  • 19