I have two data frames df1 and df2. They have the same (two) columns. I want to remove the rows from df1 that are in df2.
Asked
Active
Viewed 1.2k times
10
-
1related: http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in – Ben Jun 06 '12 at 05:11
3 Answers
8
You can do that with several packages. But here's how to do it with base R.
df1 <-matrix(1:6,ncol=2,byrow=TRUE)
df2 <-matrix(1:10,ncol=2,byrow=TRUE)
all <-rbind(df1,df2) #rbind the columns
#use !duplicated fromLast = FALSE and fromLast = TRUE to get unique rows.
all[!duplicated(all,fromLast = FALSE)&!duplicated(all,fromLast = TRUE),]
[,1] [,2]
[1,] 7 8
[2,] 9 10

Pierre Lapointe
- 16,017
- 2
- 43
- 56
-
1I like this, but I don't understand it. I like it because (unlike the other answer) it leaves all three variables in my dataframes intact (the other answer deletes those and adds a new "v1" that seem to be the row.names from one of the original DFs). I don't understand it as this is the first time I have seen "!duplicated" used - plus (or "and"?) I don't understand how the "fromLast" "FALSE" and "TRUE" parts work. Something else to study. But it is an elegant solution. – WGray May 14 '14 at 01:47
-
Hi, Does the solution also work for more than 2 times duplicated rows? – eclairs Apr 10 '17 at 09:41
-
@eclairs If I understand your question correctly, yes this solution will exclude multiple duplicates. – Pierre Lapointe Apr 10 '17 at 12:21
3
Try this:
df2 <-matrix(1:6,ncol=2,byrow=TRUE)
df1 <-matrix(1:10,ncol=2,byrow=TRUE)
data.frame(v1=setdiff(df1[,1], df2[,1]), v2=setdiff(df1[,2], df2[,2]))
v1 v2
1 7 8
2 9 10
Note that df1
and df2
are the same as Lapointe's but in the other way around, because you want to remove the rows from df1 that are in df2, so setdiff removes elements from x
that are contained in y
. See ?setdiff
you'll get the same result as Lapointe's

Jilber Urbina
- 58,147
- 10
- 114
- 138
2
I got an easy one considering you have a variable (var_match) that matches between the two dataframes:
df_1_minus_2 <- df_1[which(!df_1$var_match %in% df_2$var_match),]

Momchill
- 417
- 6
- 15