0

I'm sure someone has asked this before or that I could research a way to do this efficiently but I'm tight on time, and I'm not sure how to word my issue.

I have a data frame of large dimensions but I noticed that for some reason one of my columns has odd numbers.

head(testCA_extract[5])
   ZIP_CODE
1     94801
2     94801
3 928034250
4     92714
5     95054
6     94565

from

> head(testCA_extract[2:6])
  REPORTING_YEAR STATE_COUNTY_FIPS_CODE  COUNTY_NAME  ZIP_CODE   CITY_NAME
1           1990                  06013 CONTRA COSTA     94801    RICHMOND
2           1990                  06013 CONTRA COSTA     94801    RICHMOND
3           1990                  06059       ORANGE 928034250     ANAHEIM
4           1990                  06059       ORANGE     92714      IRVINE
5           1990                  06085  SANTA CLARA     95054 SANTA CLARA
6           1990                  06013 CONTRA COSTA     94565   PITTSBURG

For anyone unfamiliar the zip codes are suppose to be 5 digits exactly I'm not sure why there are extra digits but it appears that the first 5 numbers regardless of length is the correct zip code.

So I need to either select only the first 5 digits or constrain the variable to the first 5 digits and delete the rest. and then I need that information to go back to it's proper row and column in the DF.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    Please don't add `rstudio` tag for questions related to only R. `rstudio` tag is reserved for questions related to the RStudio IDE. Also here's the link for your answer https://stackoverflow.com/questions/38750535/extract-the-first-2-characters-in-a-string – Ronak Shah Nov 29 '20 at 11:05
  • ok, I'll keep that in mind in the future – Jason Deutsch Nov 29 '20 at 11:25

1 Answers1

0

For your future posts, it will be good practice to include a minimum, reproducible example. In this simple case,

x <- as.numeric(substr(as.character(x), 1, 5))

where x is the variable containing your ZIP codes should do the trick.

yrx1702
  • 1,619
  • 15
  • 27
  • I will try, sometimes it feels like it could make things more confusing since I might make the example DF wrong. Does it help readability or does it make things faster for people? – Jason Deutsch Nov 29 '20 at 10:30
  • 3
    @JasonDeutsch it's far easier for answerers, which makes it more likely you will get a quick and accurate answer. It's very easy to make a reproducible example. In your case the output produced by `dput(head(testCA_extract[2:6]))` will recreate these rows perfectly. – Allan Cameron Nov 29 '20 at 10:39