0

I'm importing and appending hundreds of Excel spreadsheets into R using map_dfr in combination with a user-defined function:

Function to import specific columns in each worksheet:

fctn <- function(path){map_dfc(.x = c(1,2,3,7,10,11,12,13), ~ read.xlsx(path,
                                                                        sheet=1,
                                                                        startRow = 7,
                                                                        colNames = FALSE,
                                                                        cols = .x))}

Code to pull all the files in the "path" and append them, where file.list is the list of paths and files to import:

all.files <- map_dfr(file.list, ~ fctn(path=.x))

My problem is, some of these sheets have missing values in some of the columns, but not others, and R doesn't like that. I encounter this error, for instance:

"Error: can't recycle '..1' (size 8) to match '..2' (size 6)", which happens because column 2 is missing information in two cells.

Is there any way to make R accept missing values in cells?

Phil
  • 7,287
  • 3
  • 36
  • 66
so03tw
  • 1
  • 1
  • 1
    Is there a particular reason you are reading each `path` eight times, one column each time, and outputting them as a `dfc`? While not just `read.xlsx(..., cols=c(1,2,3,...,13))` *once*? (Or perhaps that `.x=c(...)` is actually a list-column in a bigger frame, so it is better represented as `.x = list(c(...))`?) – r2evans Oct 15 '21 at 19:22
  • I'm pretty new to R (been a SAS coder for many years) and this was the method I cobbled together! Always open to suggestions if it's too clunky. – so03tw Oct 15 '21 at 20:12
  • My suggestion is to use `fctn <- function(path) read.xlsx(path, sheet=1, startRow=7, colNames=FALSE, cols=c(1:3, 7, 10:13))`. As for your error, it's hard to give clear advise without the data that is causing the error; since it's about recycling, it suggests that removing `map_dfc` might get rid of your error, suggesting that some of the columns you are picking have empty cells. – r2evans Oct 16 '21 at 09:02
  • Thanks, however each spreadsheet I'm importing is potentially a different size, so I can't specify blocks of cells, only rows. And I think that's my problem. I was hoping there was an option I hadn't discovered yet in the read.xlsx function to treat missings as something else... – so03tw Oct 16 '21 at 12:10
  • I don't know anything about the spreadsheets, so there's little I can do to well-advise. Please read about how to make questions ***reproducible***: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! – r2evans Oct 16 '21 at 12:11
  • 1
    So it turns out the problem was the map_dfc, I removed it and all is fine. Thanks for that! I was just over-engineering the solution. – so03tw Oct 16 '21 at 13:08

0 Answers0