Why does R have inconsistent behaviors when a non-existent rowname is retrieved from a data frame?

Question

I wonder why two data frames a and b have different outcomes when a non-existent rowname is retrieved. For example,

a <- as.data.frame(matrix(1:3, ncol = 1, nrow = 3, dimnames = list(c("A1", "A10", "B"), "V1")))
a
    V1
A1   1
A10  2
B    3

b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames = list(c("A10", "B"), "V1")))
b
    V1
A10  4
B    5

Let's try to get "A10", "A1", "A" from data frame a:

> a["A10", 1]
[1] 2
> a["A1", 1]
[1] 1                    # expected
> a["A", 1]
[1] NA                   # expected
> a["B", 1]
[1] 3                    # expected
> a["C", 1]
[1] NA                   # expected

Let's do the same for data frame b:

> b["A10", 1]
[1] 4
> b["A1", 1]
[1] 4                    # unexpected, should be NA
> b["A", 1]              
[1] 4                    # unexpected, should be NA
> b["B", 1]
[1] 5                    # expected
> b["C", 1]
[1] NA                   # expected

Now that a["A", 1] returns NA, why does b["A", 1] or b["A1", 1] not?

PS. R version 3.5.2

Thanks @AhmedAli, I kind of heard about it, such as https://stackoverflow.com/questions/14153904/why-does-r-use-partial-matching, but shouldn't it be limited to lists/colnames only? — foehn, Jan 14 '22 at 22:00
No, it seems to be present in data.frame as well. For example, see https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names You can also check that data.frame subsetting uses pmatch `View(\`[.data.frame\`)` — Ahmed Ali, Jan 14 '22 at 22:06
Hmmm. `?"["` says "Unlike S (Becker _et al_ p. 358), R **never uses partial matching when extracting by ‘[’**" - is this a documentation bug (or at least a doc/code mismatch), or have I misunderstood something?? — Ben Bolker, Jan 15 '22 at 01:35
@Ben Bolker I read that the same way you do. It appears that there is an undocumented exception. This has to be partial matching as Ahmed Ali said. I tried this with various combinations of letters and numbers, letters only, and numbers only (I guess numbers vs letters is a moot point since they are all read as characters). No matter what, if an exact match is unavailable, R accepts the call based on the first characters in the row name matching the index you use. — Tanner33, Jan 15 '22 at 02:25

Mikael Jagan · Accepted Answer · 2022-01-15T04:25:38.357

Synthesizing some of the comments here...

?`[` says:

Unlike S (Becker et al p. 358), R never uses partial matching when extracting by [, and partial matching is not by default used by [[ (see argument exact).

But ?`[.data.frame` says:

Both [ and [[ extraction methods partially match row names. By default neither partially match column names, but [[ will if exact = FALSE (and with a warning if exact = NA). If you want to exact matching on row names use match, as in the examples.

The example given there is:

sw <- swiss[1:5, 1:4]
sw["C", ]
##            Fertility Agriculture Examination Education
## Courtelary      80.2          17          15        12

sw[match("C", row.names(sw)), ]
##    Fertility Agriculture Examination Education
## NA        NA          NA          NA        NA

Meanwhile:

as.matrix(sw)["C", ]
## Error in as.matrix(sw)["C", ] : subscript out of bounds

So row names of matrices are matched exactly while row names of data frames are matched partially, and both behaviours are documented.

[.data.frame is implemented in R, not C, so you can inspect the source code by printing the function. The partial matching happens here:

    if (is.character(i)) {
        rows <- attr(xx, "row.names")
        i <- pmatch(i, rows, duplicates.ok = TRUE)
    }

There happens to be a recent thread on Bugzilla about partial matching of row names of data frames. (No discussion yet...)

It is definitely surprising that [.data.frame doesn't match the behaviour of [ with respect to character indices.

Why does R have inconsistent behaviors when a non-existent rowname is retrieved from a data frame?

1 Answers1