Comparing two large datasets and printing out any values that do not match in r

Question

I have two large data sets named "aaup1" and "aaup2" and I am trying to print the values that do not match between the two.

g <- data.frame(aaup1)
h <- data.frame(aaup2)
subset(g, !(aaup1 %in% h$aaup2))
setdiff(g$aaup1, h$aaup2)

The subset and setdiff lines were two attempts of trying to do this but it is still not working.

(1) I think this is best done with a merge/join operation: find the "key" that is shared between them, and `merge(.., by=keys)`, where all of the "values" you want to compare are *not* keys (good refs: https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272). (2) There is very little we can do to help lacking *any* representative data. See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info, specifically about using `dput(.)` or `read.table(.)` or `data.frame(.)` to share usable data with us. — r2evans, Apr 23 '22 at 01:30

score 0 · Answer 1 · answered Apr 23 '22 at 11:44

You could try to coerce as.matrix before setdiff.

list(df1, df2) |> lapply(as.matrix) |> do.call(what='setdiff')
#  [1]  21.000  22.800  18.700  18.100  14.300  24.400  17.800  16.400  10.400 160.000 108.000
# [12] 258.000 360.000 225.000 146.700 140.800 167.600 275.800 472.000 460.000 110.000  93.000
# [23] 105.000  62.000  95.000 123.000 180.000 205.000 215.000   3.900   3.850   3.210   3.690
# [34]   3.920   3.070   2.930   2.620   2.875   2.320   3.215   3.440   3.460   3.190   4.070
# [45]   3.780   5.250   5.424  16.460  17.020  18.610  19.440  20.220  15.840  20.000  22.900
# [56]  18.300  17.400  17.600  18.000  17.980  17.820

Data:

df1 <- mtcars[1:16, ] |> `rownames<-`(NULL)
df2 <- mtcars[-(1:16), ] |> `rownames<-`(NULL)

Comparing two large datasets and printing out any values that do not match in r

1 Answers1