-1

I'm new to R but from what I've been reading this one is a bit hard for me. I have two data frames, say DF1 and DF2, both of which have a variable of interest, say idFriends, and I want to create a new data frame where all the rows that do not appear in DF2 are deleted from DF1 based on the values of idFriends.

The thing is that in DF2 each value appears only once while DF1 has thousands of values, many of them repeated. BUT I don't want R to delete repetitions, I just want it to search DF2, see if EACH value of DF1 exists in DF2, and if it doesn't exist delete that row and if it exists leave it as is, and do the same for each row in DF1.

I hope it's clear.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
matiasg
  • 11
  • 1
  • 1

2 Answers2

4

dplyr has an semi_join function that does that.

DF1 %>% semi_join(DF2, by = "idFriends") # keep rows with matching ID
DF1 %>% anti_join(DF2, by = "idFriends") # keep rows without matching ID
Thierry
  • 18,049
  • 5
  • 48
  • 66
2

Hard to say without a reproducible example, but %in% is probably what you are looking for:

DF1[!DF1$idFriends %in% DF2$idFriends,]
C_Z_
  • 7,427
  • 5
  • 44
  • 81