0

I have a data.table as follows:

library(data.table)
library(haven)
df1 <- fread(
    "A   B   C  iso   year   
     0   B   1  NLD   2009   
     1   A   2  NLD   2009   
     0   Y   3  AUS   2011   
     1   Q   4  AUS   2011   
     0   NA  7  NLD   2008   
     1   0   1  NLD   2008   
     0   1   3  AUS   2012",
  header = TRUE
)

Now with the following data.table I have simply removed two lines:

df2 <- fread(
    "A   B   C  iso   year   
     0   B   1  NLD   2009   
     0   Y   3  AUS   2011   
     1   Q   4  AUS   2011   
     0   NA  7  NLD   2008   
     1   0   1  NLD   2008",
  header = TRUE
)

What I want, is to figure out, what unique values have been removed. For example, below I have removed two lines, but only one value that was unique in df1:

df1[, uniqueN(.SD), .SDcols=c("iso", "year")] # 4

df2[, uniqueN(.SD), .SDcols=c("iso", "year")] # 3

What I want is to extract the unique value that was removed from df1. Desired answer:

AUS 2012

I have a feeling this is quite a difficult question (or hopefully, I am missing something obvious). But perhaps someone can get me on the right track..

Tom
  • 2,173
  • 1
  • 17
  • 44
  • 2
    Do you need an `anti_join` ? `dplyr::anti_join(df1, df2, by = c('iso', 'year'))` . See the `data.table` version here https://stackoverflow.com/a/28703077 – Ronak Shah Jul 11 '21 at 13:05
  • Thank you Ronak. At first sight that appears to work very well. I'm going to experiment with it a bit. – Tom Jul 11 '21 at 13:07
  • 1
    One direct way to get the unique values (your desired output) while joining is to combine anti-join and `unique` function together . `cols <- c("iso", "year")`; `df1[!df2, unique(.SD), on=cols, .SDcols=cols]`. Note the use of the *exclamation point* in front of `df2`. This allows to perform anti-join. – B. Christian Kamgang Jul 11 '21 at 13:45

0 Answers0