-8

I have data as below

98-45.3A-22
104-44.0A-23
983-29.1-22
1757-42.5A-22
4968-37.3A2-23

I want to add leading zeros to make the numbers as 6 digit before first hypen

000098-45.3A-22
000104-44.0A-23
000983-29.1-22
001757-42.5A-22
004968-37.3A2-23 
zx8754
  • 52,746
  • 12
  • 114
  • 209
Vinay
  • 75
  • 1
  • 8
  • 1
    Use strsplit, then see [this post](http://stackoverflow.com/questions/5812493) for padding, then paste it back together again. Show some code effort. – zx8754 Sep 08 '16 at 06:40

2 Answers2

2

We can use sub to extract the numbers before the first - by matching the - followed by one or more characters (.*) till the end of the string, replace it with "", convert it to numeric (as.numeric), as well as extract the substring from the first instance of - till the end of the string by matching one or more characters that are not a - ([^-]+) from the start of the string, replace it with "". Use these substrings as arguments in sprintf with the correct fmt to paste it together.

df1$V1 <- sprintf("%06d%s", as.numeric(sub("\\-.*", "", df1$V1)), sub("^[^-]+", "", df1$V1))
df1
#               V1
#1  000098-45.3A-22
#2  000104-44.0A-23
#3   000983-29.1-22
#4  001757-42.5A-22
#5 004968-37.3A2-23

We can also do this in a single step using gsubfn. Here, we match the numbers (\\d+) at the start (^) of the string, capture it as a group, in the replacement, convert that the captured group into numeric and change the format with sprintf

library(gsubfn)
gsubfn("^(\\d+)", ~sprintf("%06d", as.numeric(x)), df1$V1)
#[1] "000098-45.3A-22"  "000104-44.0A-23"  "000983-29.1-22"   
#[4] "001757-42.5A-22"  "004968-37.3A2-23"

data

df1 <- structure(list(V1 = c("98-45.3A-22", "104-44.0A-23", "983-29.1-22", 
"1757-42.5A-22", "4968-37.3A2-23")), .Names = "V1", class = "data.frame", 
row.names = c(NA, -5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
-6

Since I have more than 5Lakh rows to replcae Sprintf was taking lot of time. Hence the below was faster than sprintf

df1$V1 <- str_replace(df1$V1, str_extract(df1$V1, "\\d+"), str_pad(str_extract(df1$V1, "\\d+"), 6, pad = "0"))
Vinay
  • 75
  • 1
  • 8
  • 3
    this is exactly the answer I gave you in comment in [this question](http://stackoverflow.com/q/39166496/4137985) (except that I explicitely put the package used...) so why asking the question again since you already have an answer that suits you ? – Cath Sep 08 '16 at 07:47
  • After looking in to your comment I gave an whirl to it and found an answer. – Vinay Sep 08 '16 at 08:24
  • 1
    "your" answer is exactly what I put in my comment, except for the variable name. Where do you think you put your touch in it ? – Cath Sep 08 '16 at 08:30
  • @Cath I am not proficient in to R. To be candid you have only one comment on this question which does not tell anything apart from saying that you answered(which was not at all answer). strsplit I came to know after zx8754 has commented (Hope that was not you). – Vinay Sep 08 '16 at 08:36
  • 1
    in this answer you're not using `strsplit`. All I'm saying is what the worth in asking a question that has already be answered by a code I gave you 2 weeks ago ? (and I'm not talking about trying to say you came up with it...) Also, instead of asking variations of the same question again and again, maybe it's time you open an R tutorial. – Cath Sep 08 '16 at 08:42
  • 1
    for a reminder : http://stackoverflow.com/questions/39166496/remove-anything-before-first-hypen#comment65677565_39166496 – Cath Sep 08 '16 at 08:44