9

It's often said that data.frame inherits from list, which makes sense given many common paradigms for accessing data.frame columns ($, sapply, etc.).

Yet "list" is not among the items returned in the class list of a data.frame object:

dat <- data.frame(x=runif(100),y=runif(100),z=runif(100),g=as.factor(rep(letters[1:10],10)))
> class(dat)
[1] "data.frame"

Unclassing a data.frame shows that it's a list:

> class(unclass(dat))
[1] "list"

And testing it does look like the default method will get called in preference to the list method if there's no data.frame method:

> f <- function(x) UseMethod('f')
> f.default <- function(x) cat("Default")
> f.list <- function(x) cat('List')
> f(dat)
Default
> f.data.frame <- function(x) cat('DF')
> f(dat)
DF

Two questions then:

  1. Does the failure to have data.frame formally inherit from list have any advantages from a design perspective?
  2. How do those functions that seem to treat data.frames as lists know to treat them as lists? From looking at lapply it looks like it goes to C internal code quite quickly, so perhaps that's it, but my mind's a little blown here.
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • 4
    I guess it boils down to efficiency. S3 method dispatch is costly and lists are a very basic data structure in R. Thus, they are dealt with at the C level. E.g., even `is.list` is a primitive (contrary to `is.data.frame`). – Roland Oct 26 '13 at 14:20
  • 4
    Whoever says `data.frame` _inherits_ from `list` is wrong. What they probably mean is that data.frames are implemented as a list with certain attributes and characteristics. `lapply` calls `as.list` if `X` is not a vector or if `is.object` is `TRUE` (basically if there's a `class` attribute). `as.list` is generic with a `data.frame` method. – Joshua Ulrich Oct 26 '13 at 15:19
  • 5
    I discuss this a little in http://adv-r.had.co.nz/OO-essentials.html#method-dispatch. @JoshuaUlrich I don't think it's unreasonable to say that data.frame inherits from list, but it's complicated because list and data frame don't belong to the same object system. – hadley Oct 27 '13 at 02:21
  • `!isTRUE(inherits(dat, "list"))`. // In many if not most cases one would expect data frame methods to behave differently from list methods. for example, `utils:::head.default` works on data frames as lists but... Or imagine using something like `dat[[y]][x]` instead of matrix-like indexing `dat[x,y]`. The few cases that treat data frames as lists either use `as.list` (as in `lapply`) or use are internal of primitive (as mapply, vapply, c). `identical(c(data.frame(a=1), data.frame(b=2)), c(list(a=1), list(b=2)))` – lebatsnok Jan 08 '14 at 08:40
  • 1
    is(iris) gives:"data.frame" "list" "oldClass""data.frameOrNULL" "vector". So list is a superclass of data.frame – Karl Forner Feb 06 '14 at 12:13

1 Answers1

1

I confess that classes in R are a bit confusing to me as well. But I remember once reading something like "In R data.frames are actually lists of vectors". Using the code from your example, we can verify this:

> is.list(dat)
[1] TRUE
?is.list

Note that we can also use the [[]] operator to access the elements (columns) of dat, which is the normal way to access elements of lists in R:

> identical(dat$x, dat[[1]])
[1] TRUE

We can also verify that each column is actually a vector:

> is.vector(dat$x)
[1] TRUE
Ari
  • 1,819
  • 14
  • 22