how do I constrain a variable to 5 digits but ensure it only deletes from the right

Question

I'm sure someone has asked this before or that I could research a way to do this efficiently but I'm tight on time, and I'm not sure how to word my issue.

I have a data frame of large dimensions but I noticed that for some reason one of my columns has odd numbers.

head(testCA_extract[5])
   ZIP_CODE
1     94801
2     94801
3 928034250
4     92714
5     95054
6     94565

from

> head(testCA_extract[2:6])
  REPORTING_YEAR STATE_COUNTY_FIPS_CODE  COUNTY_NAME  ZIP_CODE   CITY_NAME
1           1990                  06013 CONTRA COSTA     94801    RICHMOND
2           1990                  06013 CONTRA COSTA     94801    RICHMOND
3           1990                  06059       ORANGE 928034250     ANAHEIM
4           1990                  06059       ORANGE     92714      IRVINE
5           1990                  06085  SANTA CLARA     95054 SANTA CLARA
6           1990                  06013 CONTRA COSTA     94565   PITTSBURG

For anyone unfamiliar the zip codes are suppose to be 5 digits exactly I'm not sure why there are extra digits but it appears that the first 5 numbers regardless of length is the correct zip code.

So I need to either select only the first 5 digits or constrain the variable to the first 5 digits and delete the rest. and then I need that information to go back to it's proper row and column in the DF.

Please don't add `rstudio` tag for questions related to only R. `rstudio` tag is reserved for questions related to the RStudio IDE. Also here's the link for your answer https://stackoverflow.com/questions/38750535/extract-the-first-2-characters-in-a-string — Ronak Shah, Nov 29 '20 at 11:05

score 0 · Answer 1 · answered Nov 29 '20 at 10:26

0

For your future posts, it will be good practice to include a minimum, reproducible example. In this simple case,

x <- as.numeric(substr(as.character(x), 1, 5))

where x is the variable containing your ZIP codes should do the trick.

answered Nov 29 '20 at 10:26

yrx1702

1,619
15
27

I will try, sometimes it feels like it could make things more confusing since I might make the example DF wrong. Does it help readability or does it make things faster for people? – Jason Deutsch Nov 29 '20 at 10:30
3

@JasonDeutsch it's far easier for answerers, which makes it more likely you will get a quick and accurate answer. It's very easy to make a reproducible example. In your case the output produced by `dput(head(testCA_extract[2:6]))` will recreate these rows perfectly. – Allan Cameron Nov 29 '20 at 10:39

how do I constrain a variable to 5 digits but ensure it only deletes from the right

1 Answers1