73

I am trying to collect some data from multiple subsets of a data set and need to create a data frame to collect the results. My problem is don't know how to create an empty data frame with defined number of columns without actually having data to put into it.

collect1 <- c()  ## i'd like to create empty df w/ 3 columns: `id`, `max1` and `min1`

for(i in 1:10){
collect1$id <- i
ss1 <- subset(df1, df1$id == i)
collect1$max1 <- max(ss1$value)
collect1$min1 <- min(ss1$value)
}

I feel very dumb asking this question (I almost feel like I've asked it on SO before but can't find it) but would greatly appreciate any help.

zx8754
  • 52,746
  • 12
  • 114
  • 209
screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • untested, but this was the first hit from Google, looks like the answer from @Gabor should work: http://r.789695.n4.nabble.com/Empty-data-frame-td846772.html – Chase Mar 29 '12 at 00:29

11 Answers11

154

Would a dataframe of NAs work? something like:

data.frame(matrix(NA, nrow = 2, ncol = 3))

if you need to be more specific about the data type then may prefer: NA_integer_, NA_real_, NA_complex_, or NA_character_ instead of just NA which is logical

Something else that may be more specific that the NAs is:

data.frame(matrix(vector(mode = 'numeric',length = 6), nrow = 2, ncol = 3))

where the mode can be of any type. See ?vector

aatrujillob
  • 4,738
  • 3
  • 19
  • 32
  • 4
    This is the way I generally do it. – Hansi Mar 29 '12 at 07:11
  • Note that if you're creating data frames of a consistent size, based on some condition you can do something like: `nrow = length(df[df$columnX == "some condition",1]), ncol = length(df)` to get a dataframe of the exact dimensions you want. – DryLabRebel Oct 31 '22 at 00:56
37

Just create a data frame of empty vectors:

collect1 <- data.frame(id = character(0), max1 = numeric(0), max2 = numeric(0))

But if you know how many rows you're going to have in advance, you should just create the data frame with that many rows to start with.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
14

You can do something like:

N <- 10
collect1 <- data.frame(id   = integer(N),
                       max1 = numeric(N),
                       min1 = numeric(N))

Now be careful that in the rest of your code, you forgot to use the row index for filling the data.frame row by row. It should be:

for(i in seq_len(N)){
   collect1$id[i] <- i
   ss1 <- subset(df1, df1$id == i)
   collect1$max1[i] <- max(ss1$value)
   collect1$min1[i] <- min(ss1$value)
}

Finally, I would say that there are many alternatives for doing what you are trying to accomplish, some would be much more efficient and use a lot less typing. You could for example look at the aggregate function, or ddply from the plyr package.

flodel
  • 87,577
  • 21
  • 185
  • 223
9

You may use NULL instead of NA. This creates a truly empty data frame.

  • This is by far the simplest answer. It remains agnostic about the data type of the column, and it doesn't create a first row by default as some of the other answers do. – rsoren Apr 22 '14 at 17:14
  • Did you mean NULL in the matrix(). Got Error in matrix(NULL, nrow = 0, ncol = 3) : 'data' must be of a vector type, was 'NULL' – Jerry T Mar 10 '17 at 17:26
  • @JerryT data.frame(x = NULL, y = NULL) or simply data.frame() creates empty data frame but does not allow to fill value using some loop – Dr Nisha Arora May 30 '21 at 03:59
8
df = data.frame(matrix("", ncol = 3, nrow = 10))  
jwalton
  • 5,286
  • 1
  • 18
  • 36
Amarjeet
  • 907
  • 2
  • 9
  • 14
  • 3
    better use "NA" instead of " " while creating the df. otherwise all columns are a Factor with one level. – Jens Aug 03 '15 at 15:15
  • Great solution, but misses an extra ending `)`. And as I really dislike factors, I use tibbles: `dplyr::as_tibble(matrix(NA, ncol = 3, nrow = 10))`. You could use `NA_character_` and `NA_integer` etc. to force a data type. – MS Berends Dec 06 '17 at 13:32
8

Here a solution if you want an empty data frame with a defined number of rows and NO columns:

df = data.frame(matrix(NA, ncol=1, nrow=10)[-1]
Sally
  • 81
  • 1
  • 4
2

It might help the solution given in another forum, Basically is: i.e.

Cols <- paste("A", 1:5, sep="")
DF <- read.table(textConnection(""), col.names = Cols,colClasses = "character")

> str(DF)
'data.frame':   0 obs. of  5 variables:
$ A1: chr
$ A2: chr
$ A3: chr
$ A4: chr
$ A5: chr

You can change the colClasses to fit your needs.

Original link is https://stat.ethz.ch/pipermail/r-help/2008-August/169966.html

Jose
  • 31
  • 1
1

A more general method to create an arbitrary size data frame is to create a n-by-1 data-frame from a matrix of the same dimension. Then, you can immediately drop the first row:

> v <- data.frame(matrix(NA, nrow=1, ncol=10))
> v <- v[-1, , drop=FALSE]
> v
 [1] X1  X2  X3  X4  X5  X6  X7  X8  X9  X10
<0 rows> (or 0-length row.names)
Brendon
  • 848
  • 8
  • 24
  • 4
    Instead of dropping the first row, you could instead create the matrix with `nrow=0` – ping Mar 25 '15 at 16:01
1

If only the column names are available like :

cnms <- c("Nam1","Nam2","Nam3")

To create an empty data frame with the above variable names, first create a data.frame object:

emptydf <- data.frame()

Now call zeroth element of every column, thus creating an empty data frame with the given variable names:

for( i in 1:length(cnms)){
     emptydf[0,eval(cnms[i])]
 }
Vikram Venkat
  • 663
  • 4
  • 16
0

seq_along may help to find out how many rows in your data file and create a data.frame with the desired number of rows

    listdf <- data.frame(ID=seq_along(df),
                              var1=seq_along(df), var2=seq_along(df))
FRANK Liu
  • 1
  • 1
0

I have come across the same problem and have a cleaner solution. Instead of creating an empty data.frame you can instead save your data as a named list. Once you have added all results to this list you convert it to a data.frame after.

For the case of adding features one at a time this works best.

mylist = list()
for(column in 1:10) mylist$column = rnorm(10)
mydf = data.frame(mylist)

For the case of adding rows one at a time this becomes tricky due to mixed types. If all types are the same it is easy.

mylist = list()
for(row in 1:10) mylist$row = rnorm(10)
mydf = data.frame(do.call(rbind, mylist))

I haven't found a simple way to add rows of mixed types. In this case, if you must do it this way, the empty data.frame is probably the best solution.

Adam Waring
  • 1,158
  • 8
  • 20