13

I'm relatively new to R, and was wondering the most efficient way to iteratively construct a dataframe (one row at a time, the number of iterations "n" and the length of each row "l" are known beforehand).

  1. Create empty dataframe, add a row each iteration
  2. Preallocate n x l dataframe, modify a row each iteration
  3. Preallocate n x l matrix, modify a row each iteration, make dataframe from matrix
  4. Something else
fmark
  • 57,259
  • 27
  • 100
  • 107
daltonb
  • 635
  • 8
  • 26

2 Answers2

17

Pre-allocate!!!

And use a matrix if the data are all the same type. It will be much faster than a data.frame.

For example:

> n <- 1000      # Number of rows
> row <- 1:20*1  # one row
> 
> # Adding row, one-by-one
> Data <- data.frame()
> system.time(for(i in 1:n) Data <- rbind(Data,row))
   user  system elapsed 
   2.18    0.00    2.18 
> 
> # Pre-allocated data.frame
> Data <- as.data.frame(Data)
> system.time(for(i in 1:n) Data[i,] <- row)
   user  system elapsed 
   0.94    0.00    0.93
>
> # Pre-allocated matrix (fast!)
> Data <- as.matrix(Data)
> system.time({ for(i in 1:n) Data[i,] <- row; Data <- as.data.frame(Data) })
   user  system elapsed 
      0       0       0 
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • The last version needs a `Data <- as.data.frame(Data)` after the `for` loop to get it back to the OPs require data frame. Doesn't affect the timings though - nice answer! I knew dfs were slow but it is illustrative to see how slow in such a simple case! – Gavin Simpson Oct 27 '10 at 14:56
1

How about pre-allocating with whatever column types you need from a list first?

as.data.frame(list(a1 = vector("numeric", n), a2 = vector("character", n)))

mdsumner
  • 29,099
  • 6
  • 83
  • 91
  • That's certainly a good idea if you're replacing element-by-element. I'm not sure if you'd benefit when you're replacing entire rows. – Joshua Ulrich Oct 28 '10 at 14:45