1

Hi I have two data frames. One with missing data that I would like to add from the other data frame which has complete data.

This is the first data frame with the complete data.

ID Name Age
012 Dave 45
013 Colin 63
014 Dave 52

This is the second database that I would like to join with the first and take the ID from the first using both the name and age column.

ID Name Age
Dave 45
Colin 63
Dave 52
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 2
    Does this answer your question? [R - Add a new column to a dataframe using matching values of another dataframe](https://stackoverflow.com/questions/37034242/r-add-a-new-column-to-a-dataframe-using-matching-values-of-another-dataframe) – Cecilia López Dec 02 '21 at 09:36

4 Answers4

1

Take two datasets:

df1 <- data.frame(Name=c("Andy", "Henry", "Tom", "Andy", "Martin"),
                  Age =c(55, 46, 32, 38, 61),
                  ID  =c(1:5))


df2 <- data.frame(Name=c("Andy", "Henry", "Tom", "Andy", "Alex"),
                  Age =c(55, 46, 32, 38, 24))

Here are four different kinds of merge you can do depending on what you want to do with missing observations:

merge(df1, df2, on=c("Name", "Age"))
merge(df1, df2, on=c("Name", "Age"), all.x=TRUE)
merge(df1, df2, on=c("Name", "Age"), all.y=TRUE)
merge(df1, df2, on=c("Name", "Age"), all=TRUE)

Alternatively you could use data.table joins:

data.table(df1)[data.table(df2), on=c("Name", "Age")]

More correctly:

setDT(df1)
setDT(df2)
setkey(df1, "Name", "Age")
setkey(df2, "Name", "Age")
df1[df2]
rg255
  • 4,119
  • 3
  • 22
  • 40
0

That's one solution to your problem :

library(dplyr)

my_df1 <- data.frame("ID" = c("012", "013", "014"),
                 "Name" = c("Dave", "Collin", "Dave"),
                 "Age" = c(45, 63, 52),
                 stringsAsFactors = FALSE)

my_df2 <- data.frame("Name" = c("Dave", "Collin", "Dave"),
                 "Age"= c(45, 63, 52),
                 stringsAsFactors = FALSE)


my_df3 <- left_join(my_df2, my_df1, by = c("Name", "Age"))
DataM
  • 351
  • 1
  • 7
0

This also works:

ID <- c("012", "013", "014")
Name <- c("Dave", "Colin", "Dave")
Age <- c(45, 63, 52)
df1 <- data.frame(ID, Name, Age)
df2 <- data.frame(Name, Age)

newtable <- merge(df1, df2, by  = c("Name", "Age")) 
mark_1985
  • 146
  • 1
  • 2
  • 10
0
library(dplyr)

my_df1 <- data.frame("ID" = c("012", "013", "014"),
                 "Name" = c("Dave", "Collin", "Dave"),
                 "Age" = c(45, 63, 52),
                 stringsAsFactors = FALSE)

my_df2 <- data.frame("Name" = c("Dave", "Collin", "Dave"),
                 "Age"= c(45, 63, 52),
                 stringsAsFactors = FALSE)

You can use merge()

my_df3 <- merge(df2, df1, by = c("Name", "Age"))
my_df3

or you can use rbind.fill()to first combine different data frames into one and later delete the missing NA's

df3<-rbind.fill(df1, df2)
drop_na(df3) #drop NA's



  ID   Name Age
1  012   Dave  45
2  013 Collin  63
3  014   Dave  52
4 <NA>   Dave  45
5 <NA> Collin  63
6 <NA>   Dave  52
Rfanatic
  • 2,224
  • 1
  • 5
  • 21