7

I have a column as below.

9453, 55489, 4588, 18893, 4457, 2339, 45489HQ, 7833HQ

I would like to add leading zero if the number is less than 5 digits. However, some numbers have "HQ" in the end, some don't.(I did check other posts, they dont have similar problem in the "HQ" part)

so the finally desired output should be:

09453, 55489, 04588, 18893, 04457, 02339, 45489HQ, 07833HQ

any idea how to do this? Thank you so much for reading my post!

C_Mu
  • 305
  • 4
  • 14
  • Not exactly the same, as that question doesn't answer the case if there are letters after the number. – thc Jan 24 '18 at 21:57
  • @RichScriven I did check the post, it is not the same problem. I have the "HQ" problem with no fixed length. it is NOT a duplicate question – C_Mu Jan 24 '18 at 21:57

4 Answers4

9

A one-liner using regular expressions:

my_strings <- c("9453", "55489", "4588", 
      "18893", "4457", "2339", "45489HQ", "7833HQ")

gsub("^([0-9]{1,4})(HQ|$)", "0\\1\\2",my_strings)

[1] "09453"   "55489"   "04588"   "18893"   
    "04457"   "02339"   "45489HQ" "07833HQ"

Explanation:

^ start of string
[0-9]{1,4} one to four numbers in a row
(HQ|$) the string "HQ" or the end of the string

Parentheses represent capture groups in order. So 0\\1\\2 means 0 followed by the first capture group [0-9]{1,4} and the second capture group HQ|$.

Of course if there is 5 numbers, then the regex isn't matched, so it doesn't change.

thc
  • 9,527
  • 1
  • 24
  • 39
  • A good solution, although this won't work if the number is less than 4 digits. OP should probably clarify if they have 3 or less digits in any case. – thelatemail Jan 24 '18 at 21:56
  • Try `my_strings %>% paste("0000", ., sep='') %>% gsub("^.*([0-9]{5})(HQ|$)", "\\1\\2", .)`. It will work for any number of digits ;) – Costin Jan 24 '18 at 23:02
5

I was going to use the sprintf approach, but found the the stringr package provides a very easy solution.

library(stringr)
x <- c("9453", "55489", "4588", "18893", "4457", "2339", "45489HQ", "7833HQ")
[1] "9453"    "55489"   "4588"    "18893"   "4457"    "2339"    "45489HQ" "7833HQ"

This can be converted with one simple stringr::str_pad() function:

stringr::str_pad(x, 5, side="left", pad="0")
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "7833HQ" 

If the number needs to be padded even if the total string width is >5, then the number and text need to be separated with regex. The following will work. It combines regex matching with the very helpful sprintf() function:

sprintf("%05.0f%s", # this encodes the format and recombines the number with padding (%05.0f) with text(%s)
        as.numeric(gsub("^(\\d+).*", "\\1", x)), #get the number
        gsub("[[:digit:]]+([a-zA-Z]*)$", "\\1", x)) #get just the text at the end
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "07833HQ"
Matt L.
  • 2,753
  • 13
  • 22
  • Expected `07833HQ` instead of "7833HQ" – Costin Jan 24 '18 at 22:42
  • ah- thanks, it does look like this approach won't work if that is the desired output, and some regex is needed to split the numeric portion out. The other answers accomplish that nicely. – Matt L. Jan 24 '18 at 22:54
3

Another attempt, which will also work in cases like "123" or "1HQR":

x <- c("18893","4457","45489HQ","7833HQ","123", "1HQR")
regmatches(x, regexpr("^\\d+", x)) <- sprintf("%05d", as.numeric(sub("\\D+$","",x)))
x
#[1] "18893"    "04457"    "45489HQ"  "07833HQ"  "00123"    "00001HQR"

This basically finds any numbers at the start of the string (^\\d+) and replaces them with a zero-padded (via sprintf) string that was subset out by removing any non-numeric characters (\\D+$) from the end of the string.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

We can use only sprintf() and gsub() by splitting up the parts then putting them back together.

sprintf("%05d%s", as.numeric(gsub("[^0-9]+", "", x)), gsub("[0-9]+", "", x))
# [1] "18893"    "04457"    "45489HQ"  "07833HQ"  "00123"    "00001HQR"

Using @thelatemail's data:

x <- c("18893", "4457", "45489HQ", "7833HQ", "123", "1HQR")
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245