0

I'm trying to read an Excel file with over 30 tabs of data. The complication is that each tab actually has 2 tables in it. There is a table at the top of the sheet, then a few blank rows, then a second table below with completely different column titles.

I'm aware of the openxlsx and readxl packages, but they seem to assume that the Excel data is formatted into tidy tables.

If I can get the raw data into R (perhaps in a text matrix...), I'm confident I can do the dirty work of parsing it into data frames. Any advice? Many thanks.

Artem
  • 3,304
  • 3
  • 18
  • 41
Arthur
  • 1,248
  • 8
  • 14
  • 1
    Welcome to StackOverflow! For code debugging please always ask with a [reproducible](https://stackoverflow.com/q/5963269/1422451) example per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description. It's not clear what the problem is. – Hack-R Aug 20 '18 at 14:21
  • 1
    Perhaps with `tidyxl`? See also `jailbreakr`. – Stéphane Laurent Aug 20 '18 at 14:51
  • I was able to solve my problem by using `read_xlsx` from the `readxl` package, and setting the option `col_names = FALSE`. – Arthur Oct 30 '18 at 12:43

1 Answers1

0

you can use XLConnect package to access arbitrary region in Excel Worksheet. Then you can extract list of data frames. Please see below:

Simulation:

library(XLConnect)

# simulate xlsx-file
df1 <- data.frame(x = 1:10, y = 0:9)
df2 <- data.frame(x = 1:20, y = 0:19)
wb <- loadWorkbook("temp.xlsx", create = TRUE )
createSheet(wb, "sh1")
writeWorksheet(wb, df1, "sh1", startRow = 1)
writeWorksheet(wb, df2, "sh1", startRow = 15)
lapply(2:30, function(x) cloneSheet(wb, "sh1", paste0("sh", x)))
saveWorkbook(wb)

Excel

Extract Data

# read.data
wb <- loadWorkbook("temp.xlsx")
df1s <- lapply(1:30, function(x) readWorksheet(wb, x, startRow = 1, endRow = 11))
df2s <- lapply(1:30, function(x) readWorksheet(wb, x, startRow = 15, endRow = 35))
df1s[[1]]
df2s[[2]]

Output data.frame #1 from the first sheet and data.frame #2 from the second one:

> df1s[[1]]
    x y
1   1 0
2   2 1
3   3 2
4   4 3
5   5 4
6   6 5
7   7 6
8   8 7
9   9 8
10 10 9
> df2s[[2]]
    x  y
1   1  0
2   2  1
3   3  2
4   4  3
5   5  4
6   6  5
7   7  6
8   8  7
9   9  8
10 10  9
11 11 10
12 12 11
13 13 12
14 14 13
15 15 14
16 16 15
17 17 16
18 18 17
19 19 18
20 20 19
Artem
  • 3,304
  • 3
  • 18
  • 41