Here is my original function:
best <- function(state=NULL, outcome){
colNum <- list("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)[outcome]
if (!is.null(colNum[[1]])){
df <- read.csv("outcome-of-care-measures.csv", colClasses = "character")[,c(2,7,colNum[[1]])]
df <- df[df[2] == state & df[3] != "Not Available",]
if(nrow(df)==0){stop("invalid state")}
df
} else {stop("invalid outcome")}
}
When I call best(outcome = "heart attack")
I get the following error:
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent
Debug shows the error being thrown on this line:
df <- df[df[2] == state & df[3] != "Not Available",]
However, when I change this line to df <- df[df[[2]] == state & df[[3]] != "Not Available",]
the code runs correctly.
Alternatively, if I change the default value of state
to "NULL" instead of NULL
, I also don't get an error.
I believe the problem is something to do with the fact the length(NULL) == 0
but what I don't quite understand is why this problem is negated by including the double square brackets here df <- df[df[[2]] == state & df[[3]] != "Not Available",]
. I also was under the assumption that it was good practise to use NULL
as a default argument value but this suggests not?
The following function I wrote purely to recreate the error without the CSV so feel free to use:
best2 <- function(state=NULL, outcome=NULL){
df <- data.frame(hospital = c("H1", "H2", "H3"), state = c("NY","NY","CA"), mortality = c("Not Available", "14.1", "16.2"))
df <- df[df[2] == state & df[3] != "Not Available",]
df
}
If you call best2()
with single square brackets it throws the same Error in matrix
but with double square brackets it throws a warning Warning message: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
but runs.
So my questions are:
1) Please can someone explain how the error is occurring?
2) Is it bad practise to use 'NULL' as default value?
3) What is the difference between df[2]
and df[[2]]
here? Using the class()
function df[2]
is a data.frame and df[[2]]
is a character vector but I'm confused about why they both work, why one affects the aforementioned error and which is best practise to use.