1

What is the fastest alternative to rbind.fill? I have a list of dataframes (all with the same column names) and would like to create one big dataframe that has used rbind on each dataframe (or something equivalent to this). Any solution, including tidyverse is good.

bill999
  • 2,147
  • 8
  • 51
  • 103
  • 2
    `dplyr::bind_rows(x)`, `data.table::rbindlist(x)`, base `do.call(rbind, x)` are all very fast, *much* faster than `rbind`ing one frame at a time (which is the second circle within the [R Inferno](https://www.burns-stat.com/pages/Tutor/R_inferno.pdf), *"Growing Objects"*). – r2evans Sep 03 '21 at 17:10

2 Answers2

2

In a speed comparison performed of rbind, bind_rows, and rbindlist by Ashwin Malshé in 2018 https://rstudio-pubs-static.s3.amazonaws.com/406521_7fc7b6c1dc374e9b8860e15a699d8bb0.html

In ascending order:

  1. rbindlist from data.table is the fastest. It’s more than twice faster than bind_rows from dplyr.

  2. bind_rows from dplyr, which was more than 10 times faster than rbind from base R

  3. rbind base R

There are certainly a few extreme values in all 3 simulations but the medians are close to the means, suggesting small influence of extreme values!

TarJae
  • 72,363
  • 6
  • 19
  • 66
-1

If the data frames are in a list, the base R Reduce function can rbind them all. e.g.,

df1 <- mtcars
df2 <- df1
df3 <- df2
mylist <- list(df1, df2, df3)
rbindall <- Reduce(rbind, mylist)
SteveM
  • 2,226
  • 3
  • 12
  • 16
  • While the use of `Reduce` is admirable and relatively varsity in R circles, I dv'd this because it is actually the *slowest* option performance-wise (due to memory allocation and how `rbind.data.frame` works under the hood). By orders of magnitude, even with modest data. (I don't know of any reasonable method that is actually slower and less-efficient memory-wise. R is good at many things, this is not a good example of that :-). Good reprex answer, just an antipattern. – r2evans Sep 03 '21 at 17:34