4

Here's a hack to create an empty data frame with no rows and no columns:

iris[FALSE, FALSE]
#> data frame with 0 columns and 0 rows

Smarter-looking code creates a spurious column:

x <- list(NULL)
class(x) <- c("data.frame")
attr(x, "row.names") <- integer(0)
str(x)
#> 'data.frame':    0 obs. of  1 variable:
#>  $ : NULL

Is there a non-hack alternative?

The reason to create such a thing is to satisfy a function that can handle empty data frames but not NULLs.

This is different from similar questions because it is about having no columns as well as no rows.

nacnudus
  • 6,328
  • 5
  • 33
  • 47
  • 1
    But that question is about specifying column types. – nacnudus Jun 07 '16 at 01:19
  • 2
    `structure(list(),class="data.frame")` would be a way to go your original method of trying to add a class to a list. – thelatemail Jun 07 '16 at 01:32
  • I don't think this is a duplicate – thelatemail Jun 07 '16 at 01:34
  • OP says they're trying "to satisfy a function that can handle empty data frames but not NULLs"... if OP is the one writing the function then can I suggest they're attacking this from the wrong side? What about testing `inherits(x, "data.frame")`? which will pass for a `data.frame` (empty or not) but will fail for `NULL`. If they're trying to pass data into an existing function, then `data.frame()` should bypass the test (which could very likely be the above one anyway). – Jonathan Carroll Jun 07 '16 at 01:39
  • @JonathanCarroll OP is trying to satisfy `unnest` in `tidyr` package ;) – nacnudus Jun 07 '16 at 02:13
  • Can I ask then; are you trying to work around a situation where you somehow have a list involving `NULL` or are you trying to avoid creating that situation? `unnest` will gladly take a `NA` entry... `iris %>% nest(-Species) -> ndf; ndf$data[[2]] <- NA_integer_; unnest(ndf, Species)` – Jonathan Carroll Jun 07 '16 at 02:29
  • @JonathanCarroll I'm using `bind_rows` to join data frames with list columns, which results in NULLs where a list column isn't present in one of the data frames. – nacnudus Jun 07 '16 at 09:26
  • Check out `data.table::rbindlist(list(DT1, DT2), fill=TRUE)`? – Jonathan Carroll Jun 07 '16 at 09:30
  • It's still NULL. I think I agree with it using NULL too, since all the available NA types (NA_real_ etc.) are atomic, which would be inconsistent with the data.frame elements, which are non-atomic. – nacnudus Jun 07 '16 at 09:41

2 Answers2

7
df <- data.frame()
str(df)
'data.frame':   0 obs. of  0 variables
Psidom
  • 209,562
  • 33
  • 339
  • 356
2
empty.data.frame <- function() {
  structure(NULL,
            names = character(0),
            row.names = integer(0),
            class = "data.frame")
}
empty.data.frame()
#> data frame with 0 columns and 0 rows

# thelatemail's suggestion in a comment (fastest)
empty.data.frame2 <- function() {
  structure(NULL, class="data.frame")
}

library(microbenchmark)
microbenchmark(data.frame(), empty.data.frame(), empty.data.frame2())
#> Unit: microseconds
#>                 expr    min      lq     mean median     uq    max neval
#>         data.frame() 12.831 13.4485 15.18162 13.879 14.378 65.967   100
#>   empty.data.frame()  8.323  9.0515  9.76106  9.363  9.732 19.427   100
#>  empty.data.frame2()  5.884  6.9650  7.63442  7.240  7.540 17.746   100
nacnudus
  • 6,328
  • 5
  • 33
  • 47
  • Is performance really an issue here? The only possible scaling is with repetition. – Jonathan Carroll Jun 07 '16 at 02:24
  • @JonathanCarroll how would you scale an empty data frame? – Pierre L Jun 07 '16 at 02:35
  • @PierreLafortune that's my point. This isn't going to be a costly procedure no matter which way you do it. You may potentially do it lots of times (under some odd scenario) but even then it's not slow. – Jonathan Carroll Jun 07 '16 at 02:36
  • His test is most likely for learning to see which function call gets to the point with the least internal moves. – Pierre L Jun 07 '16 at 02:37
  • "least internal moves" -- precisely, to get as close as possible to a variable declaration like "int a;". I should have said that in the question. In practice it's far faster to create one and copy it. – nacnudus Jun 07 '16 at 08:37