How to find missing elements from two data frames

Question

I have two dataframes. The first is a set of addresses including City and State. The second is from the zipcode package. I am attempting to find all of the rows from the first data frame that have an invalid state and zipcode match.

I attempted to merge the two data frames together. I was successful and can determine which ones match, but I really need to go the other direction and find the errors

Welcome to SO! Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Apr 12 '19 at 22:39

score 1 · Answer 1 · answered Apr 12 '19 at 23:28

Credits go to @ericOss, anti_join is the easiest way

Sample data
Next time either provide your data (or build a small example set as I did):

library(zipcode)
data(zipcode)

# Data
df1 <- head(zipcode)
df2  <- head(zipcode)

# Remove some things
df2[2,1] <- 0000   #wrong zip
df2[4,3] <- 'FOO' # wrong stat

df1

    zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2 00211 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth    NH  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

df2

   zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2     0 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth   FOO  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

Anti_join
Then you can use print(df2 %>% anti_join(df1)) which will give you:

    zip       city state latitude longitude
1     0 Portsmouth    NH  43.0059  -71.0132
2 00213 Portsmouth   FOO  43.0059  -71.0132

anti_join() return all rows from x where there are not matching values in y, keeping just columns from x.

(anti_join comes with dplyr install it using install.packages("dplyr") if you haven't already)

How to find missing elements from two data frames

1 Answers1