0

I am trying to merge multiple data frames. Let's assume I have the following data frames:

library(tidyverse)
id <- c(10, 10, 12, 12, 13)
data <- c(500, 600, 700, 800, 900)
data1 <- data.frame(id, data)

id <- c(10, 12, 13, 14)
data <- c(550, 850, 950, 1050)
data2 <- data.frame(id, data)

id <- c(15)
data <- c(1350)
data3 <- data.frame(id, data)

Now I want to join the data frames so that the output looks like this:

> srtdata
   id data
1  10  500
2  10  600
6  10  550
3  12  700
4  12  800
7  12  850
5  13  900
8  13  950
9  14 1050
10 15 1350

This is the way I did this.

mdata1 <- dplyr::full_join(data1, data2)
mdata2 <- dplyr::full_join(mdata1, data3)

I have to write two lines as it seems full_join can take two frames x and y at a time.

To order the data in a correct way I am using order from base R:

srtdata <- mdata2[order(mdata2$id), ]

Anyone here to help me finding a better solution?

Cettt
  • 11,460
  • 7
  • 35
  • 58
JontroPothon
  • 127
  • 1
  • 9

3 Answers3

1

Use joins when there is some overlap in your data frames. If you simply want to stack data frames then use dplyr::bind_rows. Then sort using dplyr::arrange:

bind_rows(data1, data2, data3) %>% 
    arrange(id)

#### OUTPUT ####

   id data
1  10  500
2  10  600
3  10  550
4  12  700
5  12  800
6  12  850
7  13  900
8  13  950
9  14 1050
10 15 1350

bind_rows works quite well with lists of data frames as well:

df_list <- list(data1 = data1, data2 = data2, data3 = data3)
bind_rows(df_list, .id = "dataframe") %>% 
    arrange(id)

#### OUTPUT ####

   dataframe id data
1      data1 10  500
2      data1 10  600
3      data2 10  550
4      data1 12  700
5      data1 12  800
6      data2 12  850
7      data1 13  900
8      data2 13  950
9      data2 14 1050
10     data3 15 1350
1

looks like you are looking for bind_rows:

bind_rows(data1, data2, data3) %>%
  arrange(id)

If your data.frames are all in a list you can use bind_rows on the list as well (thanks to Shinobi_Atobe for pointing out that the use of do.call is obsolete here):

mylist <- list(data1, data2, data3)
bind_rows(mylist) %>%
  arrange(id)

The last approach is useful if you have a large number of data.frames and you don't want to spell the name of each data.frame.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • I'm afraid I beat you to the first answer by a couple of minutes. It is worth mentioning `do.call` though. Maybe expand on that –  Oct 31 '19 at 10:09
  • 1
    Do you need the `do.call`? doesn't `bind_rows(mylist)` work? – Shinobi_Atobe Oct 31 '19 at 10:10
  • @Shinobi_Atobe thank you. Did not know that. I used to work with `do.call(rbind, ...)` a lot where this shortcut is not possible. – Cettt Oct 31 '19 at 10:13
1

You can also use rbindlist from the data.table package.

require(data.table)
rbindlist(list(data1, data2, data3))[order(id)]

    id data
 1: 10  500
 2: 10  600
 3: 10  550
 4: 12  700
 5: 12  800
 6: 12  850
 7: 13  900
 8: 13  950
 9: 14 1050
10: 15 1350
JDG
  • 1,342
  • 8
  • 18