2

Here are 2 data frames:

df1 <- data.frame(ID=c(1,2,3,4))
df1
df2 <- data.frame(ID=c(1,3))
df2

How can I join them to get the following output?:

#  ID.1 ID.2
#     1    1
#     2 
#     3    3
#     4 

Thanks!

Rez99
  • 359
  • 1
  • 4
  • 15
  • 1
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – divibisan Mar 28 '19 at 18:53
  • What are you trying to join on? Is this the actual problem you have, or does your actual dataset have an ID column that you can do a normal join on? – divibisan Mar 28 '19 at 18:55
  • I'm trying to join on the ID column. I should have mentioned that I'm looking for a dplyr solution if possible. But dplyr joins seem to always remove duplicate columns by default, so I can't get the output I was looking for. – Rez99 Mar 28 '19 at 18:57
  • Why do you want to have the column duplicated, since it doesn't contain information which is not available elsewhere? – Kerry Jackson Mar 28 '19 at 19:24
  • Thanks for asking - the 2nd column represents a subset of customer IDs that are, say, "female" - and I need to call this out. So I'm using the 2nd column to label customers from my first column. I thought of using a mutate operation instead of a join, but I have tens of millions of rows in my 2nd data frame and so I thought a join would be more efficient. – Rez99 Mar 28 '19 at 19:39
  • `dplyr::left_join` with `keep = TRUE` should work – Dmitry Zotikov Nov 05 '21 at 17:36

3 Answers3

5

Try dplyr::left_join with keep = TRUE:

> left_join(df1, df2, keep = TRUE, suffix = c('.1', '.2'), by = 'ID')
  ID.1 ID.2
1    1    1
2    2   NA
3    3    3
4    4   NA
Dmitry Zotikov
  • 2,133
  • 15
  • 12
2

An option would be to use match

data.frame(ID.1 = df1$ID, ID.2 = df2$ID[match(df1$ID, df2$ID)])
#   ID.1 ID.2
#1    1    1
#2    2   NA
#3    3    3
#4    4   NA
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You could duplicate the ID column in df2 prior to the join:

library(tidyverse)

df1 <- data.frame(ID=c(1,2,3,4))
df2 <- data.frame(ID=c(1,3)) %>% 
  mutate(ID.2 = ID)

df1 %>% 
  left_join(df2, by = c("ID" = "ID"))

  ID ID.2
1  1    1
2  2   NA
3  3    3
4  4   NA
Jordo82
  • 796
  • 4
  • 14
  • Or if you want to have the duplication within the piping `library(dplyr) df1 %>% rename(ID.1 = ID) %>% left_join(df2 %>% mutate(ID.2 = ID), by = c("ID.1" = "ID"))` – Kerry Jackson Mar 28 '19 at 19:22