0

I am trying to use the %in% function to match the observations of one of my datasets to the observations of another dataset. Essentially, I would like to make two new datasets, one that contains the observations of the second dataset, and another which contains all other observations. Here is an example dataset:

Df
Last.Name   First.Name        Group
Williams      Bob                A
Williams      Dan                C
Miller        Bob                A
Smith         Dan                C
Williams      Rick               A
Smart         Jeff               C
Miller        Bob                A
Smith         Dan                C
Jones         Bob                A
Williams      Buddy              C
Miller        Bob                A
Hends         Dan                C
Williams      Rick               A
Smart         Jeff               C
Millers       Bob                A
Smith         Danny              C

Here is a dataset that I am trying to match the observations:

dfMatch
LastName   FirstName   
Williams      Bob          
Williams      Buddy                     
Miller        Bob                  
Smith         Dan                  
Williams      Rick                  
Smart         Jeff                  
Miller        Bob                 
Smith         Dan                 

I tried various versions of the following code:

newdf<-Df[ Df$Last.Name %in% DfMatch$LastName & Df$First.Name %in% DfMatch$FirstName,]

and

 newdf<-Df[ which(Df$Last.Name %in% DfMatch$LastName & Df$First.Name %in% DfMatch$FirstName),]

To get this new dataset:

newDF
Last.Name   First.Name          Group
Williams      Bob                  A
Miller        Bob                  A
Smith         Dan                  C
Williams      Rick                 A
Smart         Jeff                 C
Miller        Bob                  A
Smith         Dan                  C
Williams      Buddy                C

However, this does not work.

I would also like to use similar code to build a dataset which includes all observations not listed in the dfMatch set, such as:

DfNoMatch
Last.Name   First.Name          Group
Williams      Dan                  C
Jones         Bob                  A
Miller        Bob                  A
Hends         Dan                  C
Williams      Rick                 A
Smart         Jeff                 C
Millers       Bob                  A
Smith         Danny                C

By using code similar to:

 DfNoMatch<-Df[ !Df$Last.Name %in% DfMatch$LastName & Df$First.Name %in% DfMatch$FirstName,]

and

 DfNoMatch<-Df[! which(Df$Last.Name %in% DfMatch$LastName & Df$First.Name %in% DfMatch$FirstName),]

Thank you in advance and any help is greatly appreciated!

JeffB
  • 139
  • 1
  • 10
  • 1
    You need joins for this. With `dplyr`: `rows_in_both = inner_join(Df, dfMatch, by = c("Last.Name" = "LastName", "First.Name" = "FirstName")`, `DfNoMatch = anti_join(Df, dfMatch, by = c("Last.Name" = "LastName", "First.Name" = "FirstName")` – Gregor Thomas Jun 01 '21 at 14:33
  • I guess after I look through this, it may be best if I want to filter, rather than match. Is there a more efficient way to do this? – JeffB Jun 01 '21 at 14:50

1 Answers1

1

To really match the observations use the match-function. The %in%-function only tells you that there is a match, but it doesn't tell you what is matched where.

Jonas
  • 1,760
  • 1
  • 3
  • 12