1

I have a large dataset of addresses, which include U.S. zipcodes. Some of the zipcodes are in five-digit format, and others are in nine-digit format. Regardless of format, if the zipcode has a leading zero (like many in Rhode Island), the leading zero has been dropped. So, I need to go through the d$zip column and identify observations where the zip is either length 4 or length 8 and then paste0("0"+d$zip)in its place to add back the leading zero. My question is how to efficiently get the conditional check written, given that I have almost 100,000 addresses.

Here is a toy df:

structure(list(ID = 1:3, street = c("555 Mockingbird Way", "909 Deadend Alley", 
"1475 Wrongway Rd"), city = c("Anywhere", "Over There", "Nowhere"
), state = c("RI", "RI", "TX"), zip = c("2863", "28632142", "78215"
)), class = "data.frame", row.names = c(NA, -3L))

Note: There are two relevant questions already, but they do not address the check for 4 or 8 digit format.

neilfws
  • 32,751
  • 5
  • 50
  • 63
KLB
  • 57
  • 7

1 Answers1

2

Assuming that the data frame is named dataset, this should work:

dataset$new_zip <- ifelse(nchar(dataset$zip) %in% c(4, 8), 
                          paste0("0", dataset$zip), 
                          dataset$zip) 
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • I understand the logic in this answer, but it throws an error for me: Error in `$<-.data.frame`(`*tmp*`, new_zip, value = logical(0)) : replacement has 0 rows, data has 82266 – KLB May 22 '23 at 05:09
  • There must be something about your real data which is different to your example data (for which the code works without issues). Can you figure out what that is? – neilfws May 22 '23 at 05:13
  • 1
    Yes. I made a silly capitalization error that I just caught. Works perfectly. Thank you! – KLB May 22 '23 at 05:22