1

I would like to find matching observations in two different datasets based on two variables.

The first dataset "df1" exists of the following two variables:

SessionID   MarkerID
14  5
14  5
14  5
14  8
17  9
17  9
17  8
17  2
17  9

The othere dataset "df2" exists of the same two variables

SessionID   MarkerID
14  5
17  8
17  2

Now, I would like to add another variable "Match" to df1 that shows if a match was found between the two datasets (Match = 1) or not (Match = 0) for an observation. The observation should have the same value for both the SessionID AND MarkerID.

The desired output looks as follows:

SessionID   MarkerID    Match
14  5    1
14  5    1
14  5    1
14  8    0
17  9    0
17  9    0
17  8    1
17  2    1
17  9    0

Reproducable example:

SessionID <- c(14,14,14,14,17,17,17,17,17)
MarkerID <- c(5,5,5,8,9,9,8,2,9)
df1 <- as.data.frame(cbind(SessionID, MarkerID))

SessionID <- c(14,17,17)
MarkerID <- c(5,8,2)
df2 <- as.data.frame(cbind(SessionID,MarkerID))

I have tried the following code but it did not produce the desired output:

df1$Match <- 0 
df1$Match[which(df1$MarkerID == df2$MarkerID & df1$SessionID == df2$SessionID )] <- 1
olive
  • 179
  • 1
  • 11

3 Answers3

1

Here is a possibility using match

df1$Match <- ifelse(is.na(match(
    paste0(df1$SessionID, df1$MarkerID, sep = "_"),
    paste0(df2$SessionID, df2$MarkerID, sep = "_"))), 0, 1)
df1;
#  SessionID MarkerID Match
#1        14        5     1
#2        14        5     1
#3        14        5     1
#4        14        8     0
#5        17        9     0
#6        17        9     0
#7        17        8     1
#8        17        2     1
#9        17        9     0

Explanation: We concatenate SessionID and MarkerID entries in both data.frames and use match to identify matching rows; ifelse marks matching entries with 1 and NA (non-matching) entries with 0.

If you want to avoid ifelse you can also do

df1$Match <- as.numeric(!is.na(match(
    paste0(df1$SessionID, df1$MarkerID, sep = "_"),
    paste0(df2$SessionID, df2$MarkerID, sep = "_"))))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

This works for me:

df1$Match <- as.numeric(do.call(paste, c(df1, sep = "-")) %in% do.call(paste, c(df2, sep = "-")))
Oscar
  • 407
  • 6
  • 16
  • That's elegant. For robustness I would add a `sep` argument to distinguish between hypothetical cases `SessionID=14, MarkerID=5` and `SessionID=1, MarkerID=45`, which in your current case both get pasted to give `"145"`. – Maurits Evers Jun 20 '18 at 11:20
  • You are completely right, I edited my answer. – Oscar Jun 20 '18 at 11:35
0

You can use left_join

df1 %>%
 left_join(df2 %>% mutate(Match = 1), by = c('SessionID', 'MarkerID')) %>%
 mutate(Match = ifelse(is.na(Match), 0 , Match))

#      SessionID MarkerID Match
# 1        14        5     1
# 2        14        5     1
# 3        14        5     1
# 4        14        8     0
# 5        17        9     0
# 6        17        9     0
# 7        17        8     1
# 8        17        2     1
# 9        17        9     0
Thor6
  • 781
  • 6
  • 9