85

How do I join multiple dataframes in R using dplyr ?

new <- left_join(x,y, by = "Flag")

this is the code I am using to left join x and y the code doesn't work for multiple joins

new <- left_join(x,y,z by = "Flag")
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
pranav
  • 1,041
  • 1
  • 10
  • 11
  • 1
    [This is how you join multiple data sets in R](http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list) usually. You can use `left_join` instead of `merge` if you like. – David Arenburg Aug 18 '15 at 07:49
  • 3
    Use `Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="index"), list(x,y,z))`. Since this question is marked as duplicate, I gave a detailed answer with example on the question [Simultaneously merge multiple data.frames in a list](http://stackoverflow.com/a/34393416/2641825). – Paul Rougieux Dec 21 '15 at 10:29

1 Answers1

187

You can use a nested left_join

 library(dplyr)
 left_join(x, y, by='Flag') %>%
                left_join(., z, by='Flag') 

Or another option would be to place all the datasets in a list and use merge from base R with Reduce

Reduce(function(...) merge(..., by='Flag', all.x=TRUE), list(x,y,z))

Or we have join_all from plyr. Here we place the dataframes in a list and use the argument type='left' for a left join.

library(plyr)
join_all(list(x,y,z), by='Flag', type='left')

As @JBGruber mentioned in the comments, it can also be done via purrr

library(purrr)
library(dplyr)
purrr::reduce(list(x,y,z), dplyr::left_join, by = 'Flag')
maxshuty
  • 9,708
  • 13
  • 64
  • 77
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 10
    I guess you can just embed `left_join` into `Reduce` too, but it seems this was already asked a few times yesterday, no? – David Arenburg Aug 18 '15 at 07:47
  • @DavidArenburg Yes, it can be and is useful for multiple datasets. I just thought to use the `%>%` . – akrun Aug 18 '15 at 07:49
  • 6
    The pipe option and reduce with join_left are much faster (1.8s) (~10x faster in my case- conditional to your data of course etc..). Reduce with merge is very slow (16s) but if you replace merge with left_join then you have comparable speed as with the pipe (wee bit slower 1.9s on average but not significant). The slowest is join_all from plyr (22s). – R. Prost Feb 08 '18 at 07:40
  • @R.Prost I think `dplyr` is more optimized for those joins – akrun Feb 08 '18 at 07:47
  • this is great... it seems obvious that the `.` in the second join is a placeholder for the first join, but can anyone explain that in functional programming terms? why is the `.` deemed as the placeholder? Is this something to do with NSE? – dre Sep 14 '19 at 13:10
  • @akrun How can i force R to retain the names of the dataframes in the columns of the new object? I am getting "value.x", "value.x.x." and so on. – nouse Jul 26 '20 at 10:36
  • 1
    @nouse You could select only the columns that are really needed so that othere than the `by` columns, there would be unique names in both datasets. If there are duplicate column names, that will append the `.x` `x.x` etc. – akrun Jul 26 '20 at 19:27
  • Is there a way to integrate suffixes into to the nested left_join option? For example if you wanted to assign "_T1", "_T2"...etc to all columns in each data frame. – Addison Dec 04 '20 at 23:09
  • @AddisonMaerz there is a default option `suffix = c(".x", ".y")` which you can change – akrun Dec 04 '20 at 23:12
  • @akrun but how would it work if you are joining more than two datasets? My understanding is that the character vector for that option has to be two. So in the second level of left_join, is there some place holder you can enter (e.g., c(., ".z") – Addison Dec 04 '20 at 23:32
  • @AddisonMaerz can you post as a new question as it is not clear abou tyour particular case – akrun Dec 04 '20 at 23:32
  • Can you please explain the use of the `.` in the 2nd join? Perhaps a reference link? I can' seem to find information about this anywhere. Thanks! – Danny Bullis Jul 05 '21 at 00:13
  • 1
    @DannyBullis In tidyverse, the `.` refers to the entire object from the lhs of `%>%` So, it is the data.frame created from the left join of x and y – akrun Jul 05 '21 at 17:04
  • 1
    Hey @akrun, this is the first hit on Google and it's a great answer. Could you update it to include `purrr::reduce` since this is, I believe, the best current option `purrr::reduce(list(x,y,z), dplyr::left_join, by = 'Flag')`. – JBGruber Jan 31 '22 at 08:45
  • 1
    Loved the `reduce() merge()` combination. Say you collected from the environment a list of frequency dataframes and you wanted to combine them by a 'hallmarks' column: `df_list=mget((ls(pattern=".*_freq"))) Reduce(function(...) merge(..., by='hallmarks', all.x=TRUE), df_list )` – bioSlayer Sep 29 '22 at 00:17