23

I'm trying to open an .xlsb file in R and keep getting similar errors.

Any recommendations on how to solve this issue without having to download the data and save it in a different formate?

read.xlsx(paste(OutputLocation,FileName, sep=""), sheetName = "Master Data")

Error messages: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : org.apache.poi.xssf.XLSBUnsupportedException: .XLSB Binary Workbooks are not supported

rigDataWB<-loadWorkbook(paste(OutputLocation,FileName, sep=""))

Error messages: Error: XLSBUnsupportedException (Java): .XLSB Binary Workbooks are not supported

Please note:

I cannot install Perl libraries.

I'm running 64bit R.

Reference: http://www.milanor.net/blog/?p=779

My data is from: http://phx.corporate-ir.net/phoenix.zhtml?c=79687&p=irol-reportsother

user2946746
  • 1,740
  • 3
  • 21
  • 36

5 Answers5

16

Use the RODBC package:

library(RODBC)
wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
con2 <- odbcConnectExcel2007(wb)
data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
nrow(data)
OTStats
  • 1,820
  • 1
  • 13
  • 22
6

One way could be to use ODBC:

require(RODBC)
if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
  download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1", 
                destfile = file.path(tempdir(), "test.xlsb"), 
                mode = "wb")
  conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb")) 
  df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
  head(df, 10)
  #                                             F1          F2         F3       F4        F5 F6
  # 1                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 2                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 3                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 4                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 5  Baker Hughes Gulf of Mexico Oil / Gas Split        <NA>       <NA>     <NA>      <NA> NA
  # 6                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 7                                         <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
  # 8                                       1/7/00         127        123      116         7 NA
  # 9                                      1/14/00         125        121      116         5 NA
  # 10                                     1/21/00         125        121      116         5 NA
  close(conn) 
}
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • 3
    Just a quick note, if you're getting an error, update your Windows drivers [here](https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=13255) and the solution works. – hubbs5 Nov 03 '16 at 16:36
  • 1
    Hi. Is this answer obsolete now? There doesn't seem to be a `odbcConnectExcel2007` in `RODBC`. Or is this Windows only? – user2957945 Sep 15 '21 at 11:56
  • I'm using windows and `odbcConnectExcel2007 ` does exist. – Fabian Pino Feb 08 '23 at 20:21
3

readxlsb package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:

read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)

sheet:

Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored

range:

range can be specified as

  • A named range. Named ranges are not case sensitive
  • In Sheet!A1 notation
  • In Sheet!R1C1 notation
  • As a cellranger::cell_limits object

col_names

  • TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’
  • FALSE: Column names will be ‘column.i’
  • Character vector: vector containing column names.

col_types

Can be implied from the spreadsheet or specified in advanced. When specifying types, options are

  • “logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)
  • Use “skip” (or “ignore”) to skip a column

na

A character string that is interpret as NA. This does not effect the implied data type for a column.

trim_ws

Should leading and trailing whitespaces be trimmed from character strings?

skip

The number of rows to skip before reading data.

library(readxlsb)

res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"), 
                range = "PORTFOLIO", 
                debug = TRUE)

ls(res$env)
#> [1] "content"      "named_ranges" "sheets"       "stream"

res$env$named_ranges
#>             name                     range sheet_idx first_column first_row
#> 1   INFO_RELEASE          FirstSheet!$A$11         0            1        11
#> 2        OUTLOOK 'My SecondTab'!$A$1:$C$13         1            1         1
#> 3      PORTFOLIO      FirstSheet!$A$3:$C$9         0            1         3
#> 4 SAVED_DATETIME          FirstSheet!$C$13         0            3        13
#> 5          TITLE           FirstSheet!$A$1         0            1         1
#>   last_column last_row
#> 1           1       11
#> 2           3       13
#> 3           3        9
#> 4           3       13
#> 5           1        1

Created on 2020-07-07 by the reprex package (v0.3.0)

Tung
  • 26,371
  • 7
  • 91
  • 115
  • 5
    This package is eye-wateringly slow for XLSB files of any meaningful size, and often does not complete successfully. Not that there is a real packaged alternative, but users should be mindful of this when they write code that depends on `readxlsb`. – tchakravarty Aug 23 '20 at 17:29
  • 1
    This package doesn't work and freezes the R Studio. – Adnan Hajizada Aug 25 '20 at 03:16
  • 1
    This is the only answer that worked for me on linux. Very slow for 100k records but got there. – user2957945 Sep 15 '21 at 12:06
  • This package has very limited features, for instance I cant skip NA or Blank cells while reading. – Duffer Oct 06 '21 at 22:42
  • @Duffer: you can try Python packages ([pyxlsb](https://github.com/willtrnr/pyxlsb) or [pyxlsb2](https://github.com/DissectMalware/pyxlsb2) via [reticulate](https://rstudio.github.io/reticulate/)) to work around some of the limitations of the `readxlsb` package. I haven't tested them though. – Tung Oct 09 '21 at 21:43
1

If you get the following error in R trying to connect to .xlsb:

[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified

then, you are probably are missing to install the AccessDatabaseEngine_X64.exe from Microsoft. I had this problem today, and after installing this file I've got no more error messages.

double-beep
  • 5,031
  • 17
  • 33
  • 41
braga461
  • 31
  • 3
0

try

xl.read.file(file_path, h=T, top.left.cell = "A1") from library(excel.link) 
Ruli
  • 2,592
  • 12
  • 30
  • 40
Sandeep
  • 1
  • 2