How to find of filter only the new data appearing in a data set but not in a previous set?

Question

Given I have flowing data set.

a = 1:3
b = 1:3
df1 = expand.grid(a,b)
c = 1:5
d = 1:5
df2 = expand.grid(c,d)
df3 = rbind(df1,df2)
df4 = unique(df3)

The premise is that df1 has a set of parameters. df2 has the same parameters, but with an expanded set. In the example df4 uses unique to remove any duplicated rows but it keeps only 1 of each of the duplicated rows, such that df4 is effectively the same as df2.

I need to completely remove the duplicated rows, not keep any unlike unique.

Basically I need a df5 that would only contain rows from df2 that are not part of df1, or completely remove all duplicated rows from my df3.

df5
   Var1 Var2

1     4    1
2     5    1
3     4    2
4     5    2
5     4    3
6     5    3
7     1    4
8     2    4
9     3    4
10    4    4
11    5    4
12    1    5
13    2    5
14    3    5
15    4    5
16    5    5

No, !duplicated is like unique. in the listed sample the 0 1 4 and 1 0 2 should also be removed leaving only 1 1 4 which is truly unique. — MichaelE, Apr 14 '20 at 20:02
@akrun Thx for closing. This shows several things: (1) you agree that reopening this question wasn't justified; (2) you are able & willing to close questions; (3) complaining by you that 'nobody' wants to close a certain question actually means that you are blaming yourself for not closing it. — Jaap, Apr 18 '20 at 12:54

score 1 · Accepted Answer · answered Apr 14 '20 at 19:47

We can use anti_join

library(dplyr)
anti_join(df2, df1)

Or setdiff from dplyr

setdiff(df2, df1)
#   Var1 Var2
#1     4    1
#2     5    1
#3     4    2
#4     5    2
#5     4    3
#6     5    3
#7     1    4
#8     2    4
#9     3    4
#10    4    4
#11    5    4
#12    1    5
#13    2    5
#14    3    5
#15    4    5
#16    5    5

Or with fsetdiff

library(data.table)
fsetdiff(setDT(df2), setDT(df1))

score 1 · Answer 2 · answered Apr 14 '20 at 19:51

1

In dpylr, the following function should get you want you want:

 setdiff(df2,df1)

It is essentially a left excluding sql join

answered Apr 14 '20 at 19:51

Jamie_B

299
1
5

How to find of filter only the new data appearing in a data set but not in a previous set?

2 Answers2

Related