0

EDIT: I just looked at some more ZIP codes in my file, and learned it is a leading zero these codes are missing.

I have a bunch of zip codes formatted like:

zip
8974
8974
4350
4350
7623
55111
98769

As you can see, these are missing the last 0 to meet the 5-digit ZIP code requirement, because of a formatting issue.

I'm trying to do this:

attach(dat)

for(x in zip){
    if(nchar(x) < 5){
        x <- x*10
    }
}

I've also tried this:

for(x in zip){
    if(nchar(x) < 5){
        zip[x] <- x*10
    }
}

But neither produce the desired result. How can I add the zero to these ZIP codes in R?

blacksite
  • 12,086
  • 10
  • 64
  • 109
  • 1
    if they are numeric `sprintf("%05d", c(1234, 01234, 12345))` works but depending on your platform `sprintf("%05s", c('1234', '01234', '12345'))` will work for strings only sometimes – rawr Jan 29 '16 at 16:41
  • 1
    For trailing zeros, here's another dupe link http://stackoverflow.com/questions/33656576/use-sprintf-to-add-trailing-zeros – Rich Scriven Jan 29 '16 at 16:48
  • 1
    @RichardScriven nice dupe find on the trailing zeros -- for some reason I couldn't find that when searching google. – josliber Jan 29 '16 at 16:49
  • I figured it out. Thanks. – blacksite Jan 29 '16 at 17:15

2 Answers2

3
sapply(zip, function(x){if(nchar(x)<5){paste0(x,0)}else{x}})
#zip = a vector

This should work. This will place a trailing "0" on everything <5 characters long. If you want to place a leading 0 use paste0(0,x) instead.

Output will be a vector of strings.

TJGorrie
  • 386
  • 3
  • 13
1

Are you sure they are missing the final 0, and not the initial 0? A final zero in a number is meaningful, whereas a leading zero does nothing to change the value of a number, and would be dropped by R.

What I would recommend is converting the data into either character or factor, and then using a function to add a zero to those zip codes less than 10000 (thus having only four digits, rather than the desired five). It would look something like this:

    zip <- c(8974, 8974, 4350, 4350, 7623, 55111, 87969)
    zip <- as.character(zip)
    for(i in 1:length(zip)){
        if(as.numeric(zip[i]) < 10000){
            zip[i] <- paste0("0", zip[i])
        }
    }
    zip

Either way, you shouldn't need to keep the zip codes as numeric values, because you shouldn't be doing mathematical operations on them. They're just geographic labels, so having them as characters or factors shouldn't cause any problems.

spf614
  • 52
  • 6