31

I am handling a DB with hour format like:

HOUR ID
1  2
10 4
5  6
20 6

I would like to place a zero in the value with 1 character and store them in a new column named NHOUR, like:

NHOUR HOUR ID
01 1  2
10 10 4
05 5 6
20 20 6

Until now I am struggling with something like (I follow some suggestions already provided for ifelse in the forum) :

DB$NHOUR<-with(DB,ifelse(nchar(HOUR,type="chars")==1),sprintf("%02d",HOUR),as.numeric(HOUR))

but without any success! R always reports "yes" element is not specified, etc.

As always, any tips is appreciated!

IRTFM
  • 258,963
  • 21
  • 364
  • 487
stefano
  • 601
  • 1
  • 8
  • 14
  • 5
    This looks like you're making things way, way too complicated. Why not just `sprintf("%02d",DB$HOUR)`? The whole point of that function is that it pads with leading zeros to a length of 2 characters. – joran Jan 18 '13 at 23:11
  • The `sprintf` and `as.numeric` are not inside the `ifelse` call as they need to be; there is a closing parenthesis before them. Also, you are mixing return types inside the `ifelse` which will lead to type promotion that you may not be expecting. – Brian Diggs Jan 18 '13 at 23:37
  • 5
    Finally, since you're relatively new here and have asked a handful of questions, I think it would be helpful to point out that when an answer solves your problem, it is very helpful to click the check mark next to it. This greatly improves the value of the question (and the site) by giving a clear indication to future users as to which answer solved your problem. Always keep in mind, though, that you are not obligated to ever accept an answer; it is appreciated, but it is always your choice. – joran Jan 18 '13 at 23:41

4 Answers4

58

Simply following the advise in @joran's comment,

DB <- data.frame(
HOUR  = c(1, 10, 5, 20),
ID  = c(2, 4, 6, 6))

NHOUR <- sprintf("%02d",DB$HOUR) # fix to 2 characters 

cbind(NHOUR, DB) # combine old and newdata 
  NHOUR HOUR ID
1    01    1  2
2    10   10  4
3    05    5  6
4    20   20  6

Update 2013-01-21 23:42:00Z Inspired by daroczig's performance test below, and because I wanted to try out the microbenchmark package, I've updated this question with a small performance test of my own comparing the three different solutions suggested in this thread.

# install.packages(c("microbenchmark", "stringr"), dependencies = TRUE)
require(microbenchmark)
require(stringr)

SPRINTF <- function(x) sprintf("%02d", x)
FORMATC <- function(x) formatC(x, width = 2,flag = 0)
STR_PAD <- function(x) str_pad(x, width=2, side="left", pad="0")

x <- round(runif(1e5)*10)
res <- microbenchmark(SPRINTF(x), STR_PAD(x), FORMATC(x), times = 15)

## Print results:
print(res)
Unit: milliseconds
        expr       min        lq    median        uq      max
1 FORMATC(x) 623.53785 629.69005 638.78667 671.22769 679.8790
2 SPRINTF(x)  34.35783  34.81807  35.04618  35.53696  37.1622
3 STR_PAD(x) 116.54969 118.41944 118.97363 120.05729 163.9664

### Plot results:
boxplot(res)

Box Plot of microbenchmark results

Community
  • 1
  • 1
Eric Fail
  • 8,191
  • 8
  • 72
  • 128
  • Once again, I complicate the work...I thought sprintf would put a zero in front of any value! Many thanks Joran, also for the clarification in ifelse mistakes and thanks Eric to clearly report the code! – stefano Jan 18 '13 at 23:51
21

I like to use the stringr package:

DB$NHOUR <- str_pad(DB$HOUR, width=2, side="left", pad="0")
rrs
  • 9,615
  • 4
  • 28
  • 38
  • Would you elaborate on why you like the the stringr package? – Eric Fail Jan 19 '13 at 04:27
  • 5
    Eric, it like it for readability. When someone else is reading your code and they see `gsub` or, in this case, `sprintf` it's not really clear what's going on. But the `stringr` functions are very readable. E.g., `str_replace_all`, `string_detect`, or `str_pad`, it's very easy to understand the operations being perfomed. – rrs Jan 19 '13 at 13:45
  • Thank you for responding to my question, I am always curious to learn new things. Maybe because I am from an non computer science background and I therefore don't understand why some code have a higher readability then other. For me elements from the base package is often more _readable_. Furthermore, I did a small speed-test (using `proc.time()`) and on my machine `sprintf` comes out as almost twice as fast as `str_pad`, but again speed isn't everything. – Eric Fail Jan 19 '13 at 19:51
  • 2
    @rrs , @eric Please be careful using the 'str_pad' function because it does not convert the number to character before formatting (padding with 0). So if you had an instance like `x=600000` and used `str_pad(x, width = 7, pad = "0")`, your output will be "006e+05" and not "0600000". – Pankil Shah Jun 19 '17 at 19:25
  • Good point. Would you know of an good alternative? – Eric Fail Jun 22 '17 at 08:59
5

Alternative solution:

> formatC(DB$HOUR, width = 2,flag = 0)
[1] "01" "10" "05" "20"

Update: I've just run a quick test about the performance issue just to document this question

> library(microbenchmark)
> SPRINTF <- function(x) sprintf("%02d", x)
> FORMATC <- function(x) formatC(x, width = 2,flag = 0)
> x <- round(runif(1e5)*10)
> microbenchmark(SPRINTF(x), FORMATC(x), times = 10)
Unit: milliseconds
        expr       min        lq    median        uq      max
1 FORMATC(x) 688.35430 723.42458 767.06025 780.84768 878.4966
2 SPRINTF(x)  31.29167  31.96052  35.75735  40.54656 147.6805
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • 3
    +1 this is much clearer than using `sprintf`. why use old syntax with a modern language? – Matthew Plourde Jan 19 '13 at 05:30
  • 1
    @MatthewPlourde, would you care to expand on your point? What do you mean by _much clearer_? In my quick comparison of `sprintf` and `formatC` the former came out as much faster, but of course speed isn't everything. – Eric Fail Jan 19 '13 at 06:15
  • @EricFail no kidding! the difference in speed is a lot larger than I would've guessed. My enthusiasm for `formatC` has been tempered. If speed isn't a concern, though, I'd prefer it. Many of those who work with R don't come from a programming background. I always favor the approach with the highest readability from the noob point-of-view. but you also got an up vote, because it's Friday night. – Matthew Plourde Jan 19 '13 at 06:33
  • @daroczig, I like that you added the performance test, nice work. – Eric Fail Jan 21 '13 at 22:59
4

Similar to the stringr, there is stri_pad_left from stringi

library(stringi)
stri_pad_left(str=DB$HOUR, 2, pad="0")
# [1] "01" "10" "05" "20"

It should be pretty much the same speed-wise. There are similar padding functions for right and both sides.

Rorschach
  • 31,301
  • 5
  • 78
  • 129