1

I have two dataframes that have mostly the same variables and I want to compare the cases of the two dataframes. I want to create a new dataframe with all the cases that are the same in df1 and df2.

Cases are assumed to be the same if all values of the variables that are present in both dataframes are the same. There is an exception for the variable "Age", where cases are assumed to be the same if the values have a difference of maximum 1 year and the variable "Time" where a difference of 1 hour is acceptable.

ID1 <- c(100, 101, 102, 103)
V1 <- c(1, 1, 2, 1)
V2 <- c(1, 2, 3, 4)
Age <- c(25, 16, 74, 46)
Time <- c("9:30", "13:25", "17:20", "7:45")
X <- c (1, 3, 4, 1)

df1 <- data.frame(ID1, V1, V2, Age, Time, X)


ID2 <- c(250, 251, 252, 253)
V1 <- c(1, 2, 1, 2)
V2 <- c(1, 2, 2, 4)
Age <- c(26, 55, 16, 80)
Time <- c("9:30", "12:00", "12:55", "18:00")
Y <- c (3, 2, 1, 1)

df2 <- data.frame(ID2, V1, V2, Age, Time, Y)

In this example ID1=100 and ID2=250 are the same and also ID1=101 and ID2=252.

I'd like to have a new dataframe-output like this one

Note that it is not important if the values for "Age" and "Time" are taken from df1 or df2. The important Variables are X an Y.

I hope someone can help me out with this problem. Thanks a lot in advance :)

Kind regards Philip

Phil
  • 11
  • 4
  • As an answer below suggests, base R does exact merges very well with `merge`. (There are also similar tidyverse functions.) But allowing inexact matching is more complicated. See this: https://stackoverflow.com/questions/2231993/merging-two-data-frames-using-fuzzy-approximate-string-matching-in-r – sashahafner Nov 19 '21 at 11:05
  • Thank you Sasha for your message and the link. Unfortunately, I didn't find a proposed solution that is applicable to my problem here – Phil Nov 23 '21 at 16:54

1 Answers1

0

In base R:

df3 <- merge(df1, subset(df2, select = -c(Age, Time)), by = c("V1", "V2"))
df3[,c("ID1", "ID2", "V1", "V2", "Age", "Time", "X", "Y")]
koolmees
  • 2,725
  • 9
  • 23
  • Thank you for your answer but what you suggest is an exact merge. What I need to have is an exact merge for V1 and V2 but for "Age" a difference of 1 year should be allowed and for "Time" a difference of 1 hour. – Phil Nov 23 '21 at 08:04