0

I'm writing a function read_list_if whose inputs are:

  • a list files_list of files to read
  • a function read_func to read each file
  • and optionally a function select_func to skip files which don't satisfy a certain boolean condition.

The full code is

read_func <- function(...){
  read_csv(..., 
           col_types = cols(
             .default= col_integer()),
           col_names = TRUE)
}


read_list_if <- function(files_list, read_func, select_func = NULL, ...){

  if (is.null(select_func)) {

    read_and_assign <- function(dataset, read_func, ...){
      dataset_name <- as.name(dataset)
      dataset_name <- read_func(dataset, ...)
      return(dataset_name)
    }

  } else

    read_and_assign <- function(dataset, read_func, select_func, ...){
      dataset_name <- as.name(dataset)
      dataset_name <- read_func(dataset,...)
      if (select_func(dataset_name)) {
        return(dataset_name)
      } 
      else return(NULL)
    }

  # invisible is used to suppress the unneeded output
  output <- invisible(
    sapply(files_list,
           read_and_assign, read_func = read_func, 
           select_func = select_func, ..., 
           simplify = FALSE, USE.NAMES = TRUE))

}


library(readr)
files <- list.files(pattern = "*.csv")
datasets <- read_list_if(files, read_func)

Save the code in a script (e.g., test.R) in the same directory with at least one .csv file (even an empty one, created with touch foo.csv, will work). If you now source("test.R"), you get the error:

Error in read_csv(..., col_types = cols(.default = col_integer()), col_names = TRUE) : 
  unused argument (select_func = NULL)

The weird thing is that if there is no .csv file in the directory, then no error is produced. I guess this happens because, when the first argument to sapply, i.e. files_list, is an empty list, then the rest of the arguments are not evaluated (R lazy evaluation).

DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • Where is `my_read_csv` defined? When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It would be nice if your example avoided reading files on disk otherwise it's much harder for us to replicate the problem to help troubleshoot it. – MrFlick Mar 27 '18 at 16:09
  • @MrFlick thanks for the interest in the question. `my_read_csv` was a typing error: I restarted R and now the example works. It's really not hard to replicate the problem: you just need **one file** with extension `.csv` in your working dir. It can even be an empty file. `touch foo.csv`. – DeltaIV Mar 27 '18 at 16:18
  • When I run the code as provided, I do not get the "unused argument" error. It runs just fine. This error is not reproducible which is why i assumed your `my_read_csv` was doing something other that what you claimed. – MrFlick Mar 27 '18 at 16:20
  • @MrFlick this is very weird. Can you wait a minute? I'll restart R, delete all in my working folder, and rewrite the question. There's clearly something weird going on in my system. Let me restart from a clean slate, and thanks again for your support. – DeltaIV Mar 27 '18 at 16:22
  • @MrFlick can you try again with the new code? As I explain better now, if there's no `.csv` file in the working directory, then no error is produced, but if there's at least 1 file (even if empty) the error is generated. I guess this is related to R lazy evaluation, but I'm not sure. – DeltaIV Mar 27 '18 at 16:31
  • 1
    You've changed the line from `dataset_name <- read_func(dataset)` to `dataset_name <- read_func(dataset, ...)` in this version which causes the error. It has nothing do with with lazy evaluation. You are passing `select_func = select_fun` to your function in the `sapply` whether or not it's NULL. Which is causing the error. You might as well move that call into the `if` statement so you only pass that parameter when it's not null. Or change the first function so it's signature is `read_and_assign <- function(dataset, read_func, select_func, ...){` as well. – MrFlick Mar 27 '18 at 16:34
  • @MrFlick I take your word for granted that the error has nothing to do with lazy evaluation, since you clearly know more about this than me. But then why it doesn't happen if there is no csv file in the working directory? I like your suggestion of having the same signature of `read_and_assign`, irrespective of whether `select_func` is `NULL` or not. The code is way cleaner this way. I'm not at a PC right now, but I'll test the modification as soon as possible and let you know if it fixed the bug. – DeltaIV Mar 27 '18 at 18:13
  • It works! Thanks @MrFlick. – DeltaIV Mar 27 '18 at 21:56
  • @MrFlick would you like to post your comment as an answer, so that I can accept it? Or do you think this wouldn't qualify as an answer? – DeltaIV Apr 02 '18 at 06:39

1 Answers1

1

The easiest fix would probably be to "slurp up" the null select_func parameter in your read_and_assign function. This will prevent it from being passed through the ... parameter.

# ....
if (is.null(select_func)) {

  read_and_assign <- function(dataset, read_func, select_func, ...){
    dataset_name <- as.name(dataset)
    dataset_name <- read_func(dataset, ...)
    return(dataset_name)
  }

} else
# ....
MrFlick
  • 195,160
  • 17
  • 277
  • 295