-1

I have two dataframes. The first is a set of addresses including City and State. The second is from the zipcode package. I am attempting to find all of the rows from the first data frame that have an invalid state and zipcode match.

I attempted to merge the two data frames together. I was successful and can determine which ones match, but I really need to go the other direction and find the errors

SteveF
  • 1
  • 1
    Welcome to SO! Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Apr 12 '19 at 22:39
  • `anti_join` from the `dplyr` package? – ericOss Apr 12 '19 at 22:52

1 Answers1

1

Credits go to @ericOss, anti_join is the easiest way


Sample data
Next time either provide your data (or build a small example set as I did):

library(zipcode)
data(zipcode)

# Data
df1 <- head(zipcode)
df2  <- head(zipcode)

# Remove some things
df2[2,1] <- 0000   #wrong zip
df2[4,3] <- 'FOO' # wrong stat

df1

    zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2 00211 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth    NH  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

df2

   zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2     0 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth   FOO  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

Anti_join
Then you can use print(df2 %>% anti_join(df1)) which will give you:

    zip       city state latitude longitude
1     0 Portsmouth    NH  43.0059  -71.0132
2 00213 Portsmouth   FOO  43.0059  -71.0132

anti_join() return all rows from x where there are not matching values in y, keeping just columns from x.

(anti_join comes with dplyr install it using install.packages("dplyr") if you haven't already)

CodeNoob
  • 1,988
  • 1
  • 11
  • 33