Deleting rows from a data frame that are not present in another data frame in R

Question

I'm new to R but from what I've been reading this one is a bit hard for me. I have two data frames, say DF1 and DF2, both of which have a variable of interest, say idFriends, and I want to create a new data frame where all the rows that do not appear in DF2 are deleted from DF1 based on the values of idFriends.

The thing is that in DF2 each value appears only once while DF1 has thousands of values, many of them repeated. BUT I don't want R to delete repetitions, I just want it to search DF2, see if EACH value of DF1 exists in DF2, and if it doesn't exist delete that row and if it exists leave it as is, and do the same for each row in DF1.

I hope it's clear.

score 4 · Answer 1 · answered Oct 09 '15 at 15:57

4

dplyr has an semi_join function that does that.

DF1 %>% semi_join(DF2, by = "idFriends") # keep rows with matching ID
DF1 %>% anti_join(DF2, by = "idFriends") # keep rows without matching ID

answered Oct 09 '15 at 15:57

Thierry

18,049
5
48
66

score 2 · Answer 2 · answered Oct 09 '15 at 15:01

2

Hard to say without a reproducible example, but %in% is probably what you are looking for:

DF1[!DF1$idFriends %in% DF2$idFriends,]

answered Oct 09 '15 at 15:01

C_Z_

7,427
5
44
81

Deleting rows from a data frame that are not present in another data frame in R

2 Answers2

Linked