0

I have a list of unique client names and their zip codes.

I need to get extra information regarding the zip code such as state and county.

I discovered the package zipcodeR but wasn't able to use the reverse_zipcode function.

Below is a sample dataset.

df <- data.frame(name = c('Juan', 'Alice', 'Tim', 'Joe'),
                 zipc = c('11374', '11374', '11379', 'A145'))

I need an state and county column for my dataframe.

   name  zipc
1  Juan 11374
2 Alice 11374
3   Tim 11379
4   Joe  A145
M--
  • 25,431
  • 8
  • 61
  • 93
mar355
  • 155
  • 9

2 Answers2

2

Using zipcodeR package it can be done as follows:

library(dplyr)
library(zipcodeR)

zip_code_db %>% 
  select(zipcode, state, county) %>% 
  right_join(df, by = c("zipcode"="zipc"), multiple = "all")

  zipcode state        county  name
1   11374    NY Queens County  Juan
2   11374    NY Queens County Alice
3   11379    NY Queens County   Tim
4    A145  <NA>          <NA>   Joe
M--
  • 25,431
  • 8
  • 61
  • 93
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Thank you so much. This is exactly what I needed. I just got an warning message :Warning message: In right_join(., vis2, by = c(zipcode = "zip2")) : Each row in `x` is expected to match at most 1 row in `y`. ℹ Row 129 of `x` matches multiple rows. ℹ If multiple matches are expected, set `multiple = "all"` to silence this warning." but the result was perfect. – mar355 May 04 '23 at 19:04
0

It looks like the reverse_zipcode function doesn't return a value, even NA, when it can't find the zip code in the zip_code_db data frame included with the zipcodeR package. The example below checks whether the zip code "looks" correct and otherwise returns NA:

library(zipcodeR)
library(purrr)
library(dplyr)
data.frame(name = c('Juan', 'Alice', 'Tim', 'Joe'),
                 zipc = c('11374', '11374', '11379', 'A145')) %>%
  mutate(test2 = map_chr(zipc,
                         ~ ifelse(str_length(.) == 5 & !is.na(as.numeric(.)),
                                  zipcodeR::reverse_zipcode(.)$state,NA)))
Joe Robert
  • 84
  • 3
  • 1
    This is good (it's how I thought of doing it) but it is not as good as @TarJae's accepted answer because it would get hung up on all-numeric but invalid zip codes (e.g., does zipcode 99999 exist?) – Ben Bolker May 04 '23 at 19:19
  • 1
    Based on how the `reverse_zipcode` function works, it would return `NA` if given zipcode 99999. I was slightly incorrect when I said that the function doesn't return anything if it can't find the zipcode in its data set -- it doesn't return anything if it deems the zipcode "invalid" as is the case with "A145." – Joe Robert May 04 '23 at 19:22