1

Here is my original function:

best <- function(state=NULL, outcome){
    colNum <- list("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)[outcome]
    if (!is.null(colNum[[1]])){
        df <- read.csv("outcome-of-care-measures.csv", colClasses = "character")[,c(2,7,colNum[[1]])]
        df <- df[df[2] == state & df[3] != "Not Available",]
        if(nrow(df)==0){stop("invalid state")}
        df
    } else {stop("invalid outcome")}
}

When I call best(outcome = "heart attack") I get the following error:

 Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
 length of 'dimnames' [2] not equal to array extent

Debug shows the error being thrown on this line:

df <- df[df[2] == state & df[3] != "Not Available",]

However, when I change this line to df <- df[df[[2]] == state & df[[3]] != "Not Available",] the code runs correctly.

Alternatively, if I change the default value of state to "NULL" instead of NULL, I also don't get an error.

I believe the problem is something to do with the fact the length(NULL) == 0 but what I don't quite understand is why this problem is negated by including the double square brackets here df <- df[df[[2]] == state & df[[3]] != "Not Available",]. I also was under the assumption that it was good practise to use NULL as a default argument value but this suggests not?

The following function I wrote purely to recreate the error without the CSV so feel free to use:

best2 <- function(state=NULL, outcome=NULL){
    df <- data.frame(hospital = c("H1", "H2", "H3"), state = c("NY","NY","CA"), mortality = c("Not Available", "14.1", "16.2"))
    df <- df[df[2] == state & df[3] != "Not Available",]
    df
}

If you call best2() with single square brackets it throws the same Error in matrix but with double square brackets it throws a warning Warning message: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL' but runs.

So my questions are:

1) Please can someone explain how the error is occurring? 2) Is it bad practise to use 'NULL' as default value? 3) What is the difference between df[2] and df[[2]] here? Using the class() function df[2] is a data.frame and df[[2]] is a character vector but I'm confused about why they both work, why one affects the aforementioned error and which is best practise to use.

Braide
  • 155
  • 3
  • 3
  • 13
  • 1
    What is your function's intended output? Any sample data to test it with? – NelsonGon Jan 09 '20 at 13:30
  • 1
    for your third question: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el – jogo Jan 09 '20 at 13:33
  • @NelsonGon please use the best2 function. This recreates the problem with out the csv – Braide Jan 09 '20 at 13:34
  • 1
    Always have data as input, never inside the function. Easier to manipulate that way. You need `[,2]` or `[[2]]` ie `df <- df[df[,2] == state & df[,3] != "Not Available",]` – NelsonGon Jan 09 '20 at 13:38
  • @NelsonGon is there a way to upload csv to stack? why does df[2] work when state != NULL? – Braide Jan 09 '20 at 13:44
  • You can use `dput(head(df,n))` to share data. The reason your `best2` errors is due to subsetting. See jogo's attached question for more. [How to upload data to Stackoverflow](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – NelsonGon Jan 09 '20 at 13:45
  • Comparing `iris[1]` with `iris[[1]]` should be elucidating. The former returns a subset of the data.frame, which is still a data.frame. The latter extracts the column vector from the data.frame. – Roland Jan 09 '20 at 14:27

0 Answers0