How to add a column from one dataframe to a different dataframe based on an ID in R?

Question

I have two different data frames composed of strings (several thousands of rows). They each have a different number of entries. I am trying to find a code that will allow me to associate the content of the column "characteristic" in df2 into df1 (see example below) :

Df1 :

ID	Information
11AA	info1
22BB	info2
33CC	info3
44DD	info4
11AA	info1
22BB	info2

Df2:

ID	Characteristic
11AA	char1
22BB	char2
33CC	char3
44DD	char4

The intended result would be :

Df3:

ID	Information	Characteristic
11AA	info1	char1
22BB	info2	char2
33CC	info3	char3
44DD	info4	char4
11AA	info1	char1
22BB	info2	char2

I tried using inner_join(df1,df2, by="ID") but the resulting dataframe (df3) often has more rows than the original (df1) and I need that the resulting dataframe (df3) keeps the same structure than the original one (df1), I just need to add the extra column.

Is there a built in function for this type of operation?

The presence of more rows suggests that you have duplicate `ID` in either one of your input frames. That does not occur with your sample data, so you won't get it. If you want to remove the duplicates, you need to provide (to us and to R) some logic on how to do that, there is not general way to always do that. — r2evans, Aug 20 '21 at 17:55

score 3 · Answer 1 · answered Aug 20 '21 at 17:52

merge(Df1, Df2, by = "ID")
#     ID Information Characteristic
# 1 11AA       info1          char1
# 2 22BB       info2          char2
# 3 22BB       info2          char2
# 4 33CC       info3          char3
# 5 44DD       info4          char4

If you need non-matches too (since your data has 11AB in one and not the other), then use all=TRUE (or all.x= or all.y, your choice).

merge(Df1, Df2, by = "ID", all=TRUE)
#     ID Information Characteristic
# 1 11AA       info1          char1
# 2 11AB       info1           <NA>
# 3 22BB       info2          char2
# 4 22BB       info2          char2
# 5 33CC       info3          char3
# 6 44DD       info4          char4

Data

Df1 <- structure(list(ID = c("11AA", "22BB", "33CC", "44DD", "11AB", "22BB"), Information = c("info1", "info2", "info3", "info4", "info1", "info2")), class = "data.frame", row.names = c(NA, -6L))
Df2 <- structure(list(ID = c("11AA", "22BB", "33CC", "44DD"), Characteristic = c("char1", "char2", "char3", "char4")), class = "data.frame", row.names = c(NA, -4L))

Actually, I was thinking it's a dupe of https://stackoverflow.com/q/1299871/3358272. I'm waiting for them to come back with anything that explains the "more rows" problem. — r2evans, Aug 20 '21 at 18:11
Thank you so much for your answer. Actually the 11AB entry was a typo, I'm going to edit the example so it will be clear. I tried the merge() function but i still get more rows than the original dataframe. However it is entirely possible that there are duplicates in the original dataframe, which I should not remove. I also think I previously saw the post you're linking here. Is there a way I could compare my resulting dataframe with the original one to see which are the rows that are being added ? — Juan, Aug 20 '21 at 18:29
Juan, your updated data and your `inner_join` code produces 6 rows, so there is no evidence there of your "extra rows". — r2evans, Aug 28 '21 at 20:54

score 2 · Answer 2 · answered Aug 20 '21 at 18:51

2

You are looking for an outer join.

library(dplyr)

df3<- left_join(df1, df2, by = c("ID"))

answered Aug 20 '21 at 18:51

d3hero23

380
1
12

An "outer join" would be `full_join`, not `left_join`. Some good references for merges/join (both in R and SQL) are: https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272 – r2evans Nov 23 '22 at 13:04

How to add a column from one dataframe to a different dataframe based on an ID in R?

2 Answers2