2

I am trying to extract only the zip code values from my imported ACS data file, however, the rows all include "ZCTA" before the 5 digit zip code. Is there a way to remove that so just the 5 digit zip code remains?

Example:

Image of data frame with ZCTA and Zip

I tried using strtrim on the data but I can't figure out how to target the last 5 digits. I image there is a function or loop that could also do this since the dataset is so large.

kelsz24
  • 23
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064). – MrFlick Oct 26 '22 at 19:03

2 Answers2

2

To remove "ZCTA5":

gsub("ZCTA5", "", df$zip) # df - your data.frame name

or

library(stringr)
str_replace(df$zip,"ZCTA5","")

To extract ZIP CODE:

str_sub(df$zip,-5,-1)
KacZdr
  • 1,267
  • 3
  • 8
  • 23
1

Here is a few others for fun:

#option 1
stringr::str_extract(df$zip, "(?<=\\s)\\d+$")

#option 2
gsub("^.*\\s(\\d+)$", "\\1", df$zip)
AndS.
  • 7,748
  • 2
  • 12
  • 17