Trying to understand error length of 'dimnames' [2] not equal to array extent in R data.frame

Question

Here is my original function:

best <- function(state=NULL, outcome){
    colNum <- list("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)[outcome]
    if (!is.null(colNum[[1]])){
        df <- read.csv("outcome-of-care-measures.csv", colClasses = "character")[,c(2,7,colNum[[1]])]
        df <- df[df[2] == state & df[3] != "Not Available",]
        if(nrow(df)==0){stop("invalid state")}
        df
    } else {stop("invalid outcome")}
}

When I call best(outcome = "heart attack") I get the following error:

 Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
 length of 'dimnames' [2] not equal to array extent

Debug shows the error being thrown on this line:

df <- df[df[2] == state & df[3] != "Not Available",]

However, when I change this line to df <- df[df[[2]] == state & df[[3]] != "Not Available",] the code runs correctly.

Alternatively, if I change the default value of state to "NULL" instead of NULL, I also don't get an error.

I believe the problem is something to do with the fact the length(NULL) == 0 but what I don't quite understand is why this problem is negated by including the double square brackets here df <- df[df[[2]] == state & df[[3]] != "Not Available",]. I also was under the assumption that it was good practise to use NULL as a default argument value but this suggests not?

The following function I wrote purely to recreate the error without the CSV so feel free to use:

best2 <- function(state=NULL, outcome=NULL){
    df <- data.frame(hospital = c("H1", "H2", "H3"), state = c("NY","NY","CA"), mortality = c("Not Available", "14.1", "16.2"))
    df <- df[df[2] == state & df[3] != "Not Available",]
    df
}

If you call best2() with single square brackets it throws the same Error in matrix but with double square brackets it throws a warning Warning message: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL' but runs.

So my questions are:

1) Please can someone explain how the error is occurring? 2) Is it bad practise to use 'NULL' as default value? 3) What is the difference between df[2] and df[[2]] here? Using the class() function df[2] is a data.frame and df[[2]] is a character vector but I'm confused about why they both work, why one affects the aforementioned error and which is best practise to use.

What is your function's intended output? Any sample data to test it with? — NelsonGon, Jan 09 '20 at 13:30
for your third question: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el — jogo, Jan 09 '20 at 13:33
@NelsonGon please use the best2 function. This recreates the problem with out the csv — Braide, Jan 09 '20 at 13:34
Always have data as input, never inside the function. Easier to manipulate that way. You need `[,2]` or `[[2]]` ie `df <- df[df[,2] == state & df[,3] != "Not Available",]` — NelsonGon, Jan 09 '20 at 13:38
@NelsonGon is there a way to upload csv to stack? why does df[2] work when state != NULL? — Braide, Jan 09 '20 at 13:44
You can use `dput(head(df,n))` to share data. The reason your `best2` errors is due to subsetting. See jogo's attached question for more. [How to upload data to Stackoverflow](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — NelsonGon, Jan 09 '20 at 13:45
Comparing `iris[1]` with `iris[[1]]` should be elucidating. The former returns a subset of the data.frame, which is still a data.frame. The latter extracts the column vector from the data.frame. — Roland, Jan 09 '20 at 14:27

Trying to understand error length of 'dimnames' [2] not equal to array extent in R data.frame

0 Answers0