Very simple yet confusing R question about bind_rows()

Question

I'm trying to combine multiple dataframes into one dataframe using bind_rows. Each dataframe has the same column names and length. Let's say each dataframe is named "df" and there are 100 of them. So df1, df2, df3... df100.

I do not want to write each dataframe in the function as bind_rows(df1, df2 ... df100). I tried

total_df <- bind_rows(paste0(df1:df100))

It did not work. Would there be an easier way to do this? Thank you!!

`purrr::reduce()` will do the trick if all the dfs are stored in a list. If they are stored separately you'll have to gather them in a list. — jpdugo17, Jun 25 '21 at 20:44
Seems like the essence of the problem is getting many data frames into a list without explicitly naming them... this may help: https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames — Bill O'Brien, Jun 25 '21 at 20:47
Consider posting the code you used to create the data frames. Chances are it can be improved by reading the data into a list of df's from the start. — Bill O'Brien, Jun 25 '21 at 20:49

stefan · Answer 1 · 2021-06-25T20:58:21.057

The easier way would be to avoid having a bunch of dataframes named df1, ..., df100 from the start by making use of a list. Be that as it may. You could make use of lapply and get to put your df's into a list and call bind_rows on the list:

library(dplyr)

df1 <- mtcars
df2 <- mtcars
df3 <- mtcars

df_bind <- lapply(1:3, function(x) get(paste0("df", x))) %>% 
  bind_rows()

head(df_bind)
#>                        mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4...1         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag...2     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710...3        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive...4    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout...5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant...6           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

or using mget you could do

mget(paste0("df", 1:3)) %>% bind_rows()

mget %>% bind_rows is definitely the way to go here. – GuedesBF Jun 25 '21 at 21:42 — GuedesBF, Jun 25 '21 at 21:42

jpdugo17 · Answer 2 · 2021-06-25T21:13:05.203

library(tidyverse)

#generate some data
rerun(10, iris) %>% 
    walk2(str_c('df', 1:length(.)), ~assign(.y, .x, pos = .GlobalEnv))

map(str_c('df', 1:10), ~ eval(sym(.x))) %>%
    reduce(bind_rows) %>%
    as_tibble() #to avoid console flooding
#> # A tibble: 1,500 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 1,490 more rows

#if all the df's are inside a list

data <- rerun(10, iris)

reduce(data, bind_rows) %>% 
    as_tibble()
#> # A tibble: 1,500 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 1,490 more rows

^{Created on 2021-06-25 by the reprex package (v2.0.0)}

Edit: An alternative way using !!! (bang-bang-bang).

.args <- str_c('df', 1:10) %>%
    map(~eval(sym(.x)))

bind_rows(!!!.args) %>% 
    as_tibble()
#> # A tibble: 1,500 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 1,490 more rows

akaDrHouse · Answer 3 · 2021-06-26T18:20:07.160

I like using the data.table function rbindlist with the idcol parameter. That will show you the source table/df the row came from prior to binding.

I also like rbindlist because it will give you the option to fill bind tables with some columns added/missing with the fill = TRUE.

library(data.table)

df1 <- mtcars 
df2 <- mtcars 
df3 <- mtcars 
    
# Assuming your files are in working memory and have a common naming convention #
my_mem_objects <- lapply(ls(pattern = "df"), get)

# Assign names of the list to the same name they had as memory objects
names(my_mem_objects) <- ls(pattern = "df")

# Combine the dataframes and retain the name of each object the output as new column
alldata <- rbindlist(my_mem_objects,idcol = "Source_DF")

head(alldata)
tail(alldata)

Gives this output:

   Source_DF  mpg cyl disp  hp drat    wt  qsec vs am gear carb
1:         1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2:         1 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3:         1 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4:         1 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5:         1 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6:         1 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

and:

   Source_DF  mpg cyl  disp  hp drat    wt qsec vs am gear carb
1:         3 26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
2:         3 30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
3:         3 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
4:         3 19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
5:         3 15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
6:         3 21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

This is similar to another answer to this question that doesn't get at what the Asker needs. The asker is trying to figure out how to avoid the problem of typing out the names for 100 data.frames by hand in their environment: `out <- bind_rows(df1, df2, df3, df4, ..., df50, df51, df52, ..., df98, df99, df100)` — Andrew, Jun 25 '21 at 23:32
Thanks @Andrew. I did not read carefully. I have addressed that issue with one way in my edits. — akaDrHouse, Jun 26 '21 at 18:20

Very simple yet confusing R question about bind_rows()

3 Answers3

Linked