2

Here's an example:

library("data.table")

set.seed(12)

table201901 <- data.frame(ID = as.character(paste0(sample(LETTERS,5000, replace=TRUE),sample(0:9,5000, replace=TRUE))),
                 EID = paste0(sample(LETTERS,200, replace=TRUE),sample(letters,5000, replace=TRUE),sample(1:9,5000, replace=TRUE))
                 )

set.seed(15)

table201812 <- data.frame(ID = paste0(sample(LETTERS,5000, replace=TRUE),sample(0:9,5000, replace=TRUE)),
                          EID = paste0(sample(LETTERS,5000, replace=TRUE),sample(letters,5000, replace=TRUE),sample(0:9,5000, replace=TRUE))
)

table201901 <- as.data.table(table201901)
table201812 <- as.data.table(table201812)

setkey(table201901, "ID")
setkey(table201812, "ID")


table201901[!table201812]

fsetdiff(table201901, table201812)


setkey(table201901, "ID","EID")
setkey(table201812, "ID","EID")


table201901[!table201812]

fsetdiff(table201901, table201812)

My goal is to get a table that has line items that are in table201901, but not in table201812.

I thought fsetdiff(table1,table2) and table1[!table2] would give me the same output, but they don't.

Joris C.
  • 5,721
  • 3
  • 12
  • 27
Amanda R.
  • 287
  • 1
  • 2
  • 17
  • Please add all necessary packages to your question. – NelsonGon Jun 27 '19 at 16:13
  • 1
    @NelsonGon Done. Sorry I forgot to add it originally. – Amanda R. Jun 27 '19 at 16:44
  • The answer of @David Arenburg in this question should provide you with the necessary info: https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join – Joris C. Jun 27 '19 at 18:07
  • @JorisChau this works if the second table isn't a completely a subset of table 1 right? (As in say 35% of the rows in table201901 are in table201812, but the other 65% aren't). So I just need to make sure that `all=TRUE` for them to be the same? – Amanda R. Jun 27 '19 at 18:24

1 Answers1

1

The following examples demonstrate both approaches. For simplicity, two small dummy data.tables are generated with partly overlapping rows:

library(data.table)

(dt1 <- data.table(
    col1 = rep(c(1, 3, 3), each = 2),
    col2 = rep(c("only dt1", "match col2", "match row"), each = 2)
))
#>    col1       col2
#> 1:    1   only dt1
#> 2:    1   only dt1
#> 3:    3 match col2
#> 4:    3 match col2
#> 5:    3  match row
#> 6:    3  match row

(dt2 <-  data.table(
    col1 = rep(c(3, 4, 5), each = 2),
    col2 = rep(c("match row", "match col2", "only dt2"), each = 2)    
))   
#>    col1       col2
#> 1:    3  match row
#> 2:    3  match row
#> 3:    4 match col2
#> 4:    4 match col2
#> 5:    5   only dt2
#> 6:    5   only dt2

## keep all rows in dt1 not in dt2
dt1[!dt2, on = names(dt1)]
#>    col1       col2
#> 1:    1   only dt1
#> 2:    1   only dt1
#> 3:    3 match col2
#> 4:    3 match col2

## equivalent output with fsetdiff
fsetdiff(dt1, dt2, all = TRUE)
#>    col1       col2
#> 1:    1   only dt1
#> 2:    1   only dt1
#> 3:    3 match col2
#> 4:    3 match col2

## keep unique rows in dt1 not in dt2
unique(dt1[!dt2, on = names(dt1)])
#>    col1       col2
#> 1:    1   only dt1
#> 2:    3 match col2

## equivalent output with fsetdiff
fsetdiff(dt1, dt2, all = FALSE)
#>    col1       col2
#> 1:    1   only dt1
#> 2:    3 match col2

## keep rows dt1 not matching in dt2 only by "col2" 
dt1[!dt2, on = "col2"]
#>    col1     col2
#> 1:    1 only dt1
#> 2:    1 only dt1

Created on 2019-06-28 by the reprex package (v0.3.0)

Joris C.
  • 5,721
  • 3
  • 12
  • 27
  • Will this display rows where col1 is in both tables, but col2 is only in one table? I want to exclude the row if its the exact same in both, but include it if col1 is the same but col2 is different. – Amanda R. Jun 27 '19 at 20:31
  • See the edited response, `fsetdiff` will drop only exactly matching rows. For `dt1[!dt2, on = ...]` this will depend on the columns in the`on` argument. – Joris C. Jun 28 '19 at 06:17