How to add a column from a df to another df based on other columns?

Question

Hi I have two data frames. One with missing data that I would like to add from the other data frame which has complete data.

This is the first data frame with the complete data.

ID	Name	Age
012	Dave	45
013	Colin	63
014	Dave	52

This is the second database that I would like to join with the first and take the ID from the first using both the name and age column.

ID	Name	Age
	Dave	45
	Colin	63
	Dave	52

Does this answer your question? [R - Add a new column to a dataframe using matching values of another dataframe](https://stackoverflow.com/questions/37034242/r-add-a-new-column-to-a-dataframe-using-matching-values-of-another-dataframe) — Cecilia López, Dec 02 '21 at 09:36

score 1 · Answer 1 · answered Dec 02 '21 at 09:46

Take two datasets:

df1 <- data.frame(Name=c("Andy", "Henry", "Tom", "Andy", "Martin"),
                  Age =c(55, 46, 32, 38, 61),
                  ID  =c(1:5))


df2 <- data.frame(Name=c("Andy", "Henry", "Tom", "Andy", "Alex"),
                  Age =c(55, 46, 32, 38, 24))

Here are four different kinds of merge you can do depending on what you want to do with missing observations:

merge(df1, df2, on=c("Name", "Age"))
merge(df1, df2, on=c("Name", "Age"), all.x=TRUE)
merge(df1, df2, on=c("Name", "Age"), all.y=TRUE)
merge(df1, df2, on=c("Name", "Age"), all=TRUE)

Alternatively you could use data.table joins:

data.table(df1)[data.table(df2), on=c("Name", "Age")]

More correctly:

setDT(df1)
setDT(df2)
setkey(df1, "Name", "Age")
setkey(df2, "Name", "Age")
df1[df2]

score 0 · Answer 2 · answered Dec 02 '21 at 09:37

That's one solution to your problem :

library(dplyr)

my_df1 <- data.frame("ID" = c("012", "013", "014"),
                 "Name" = c("Dave", "Collin", "Dave"),
                 "Age" = c(45, 63, 52),
                 stringsAsFactors = FALSE)

my_df2 <- data.frame("Name" = c("Dave", "Collin", "Dave"),
                 "Age"= c(45, 63, 52),
                 stringsAsFactors = FALSE)


my_df3 <- left_join(my_df2, my_df1, by = c("Name", "Age"))

score 0 · Answer 3 · answered Dec 02 '21 at 09:42

0

This also works:

ID <- c("012", "013", "014")
Name <- c("Dave", "Colin", "Dave")
Age <- c(45, 63, 52)
df1 <- data.frame(ID, Name, Age)
df2 <- data.frame(Name, Age)

newtable <- merge(df1, df2, by  = c("Name", "Age"))

answered Dec 02 '21 at 09:42

mark_1985

146
1
2
10

Rfanatic · Answer 4 · 2021-12-02T10:00:02.070

library(dplyr)

my_df1 <- data.frame("ID" = c("012", "013", "014"),
                 "Name" = c("Dave", "Collin", "Dave"),
                 "Age" = c(45, 63, 52),
                 stringsAsFactors = FALSE)

my_df2 <- data.frame("Name" = c("Dave", "Collin", "Dave"),
                 "Age"= c(45, 63, 52),
                 stringsAsFactors = FALSE)

You can use merge()

my_df3 <- merge(df2, df1, by = c("Name", "Age"))
my_df3

or you can use rbind.fill()to first combine different data frames into one and later delete the missing NA's

df3<-rbind.fill(df1, df2)
drop_na(df3) #drop NA's



  ID   Name Age
1  012   Dave  45
2  013 Collin  63
3  014   Dave  52
4 <NA>   Dave  45
5 <NA> Collin  63
6 <NA>   Dave  52

How to add a column from a df to another df based on other columns?

4 Answers4