30

I found how to initialize an empty data frame with 3 or 4 dimensions. It's like

df <- data.frame(Date=as.Date(character()),
             File=character(), 
             User=numeric(), 
             stringsAsFactors=FALSE)

However, What's the most effective way to initialize an empty data.frame with a lot of column names. like

mynames <- paste("hello", c(1:10000))

The wrong way I tried is:

df <- data.frame(mynames=numeric())

Thanks a lot beforehand

Wilmer E. Henao
  • 4,094
  • 2
  • 31
  • 39
  • Related: [*Create an empty data.frame*](http://stackoverflow.com/questions/10689055/create-an-empty-data-frame) – Jaap May 21 '17 at 09:38

2 Answers2

33

Maybe this -

df <- data.frame(matrix(ncol = 10000, nrow = 0))
colnames(df) <- paste0("hello", c(1:10000))

And @joran's suggestion - df <- setNames(data.frame(matrix(ncol = 10000, nrow = 0)),paste0("hello", c(1:10000)))

TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
0

I would do this using setDF (or setDT, if you prefer data.table as output) and setnames:

library(data.table)

DF <- setnames(setDF(lapply(integer(1e4), function(...) character(0L))),
               paste0("hello", 1:1e4))
head(names(DF))
# [1] "hello1" "hello2" "hello3" "hello4" "hello5" "hello6"

Both steps (setnames and setDF) are more efficient than the base counterparts, since no copies are made.

A benchmark:

library(microbenchmark)

microbenchmark(times = 1000,
               base = {df <- data.frame(matrix(ncol = 10000, nrow = 0))
               colnames(df) <- paste0("hello", c(1:10000))},
               DT = setnames(setDF(lapply(integer(1e4), 
                                          function(...) character(0L))),
                             paste0("hello", 1:1e4)))
# Unit: milliseconds
#  expr      min       lq     mean   median       uq      max neval cld
#  base 26.77218 30.94223 37.30173 36.76721 37.80338 102.2379  1000   b
#    DT 16.68004 23.18865 30.60573 29.18421 36.03590 178.1045  1000  a 
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198