1

I have a dataset which has values like "00MOC00281" and also values like "000001". I would like to remove leading zeroes only from "00MOC00281" which should become "MOC00281" and "000001" remains as it is.

I am trying to use gsub in R like below :

Command: gsub("^0{2}(*[A-Z])", "", "00MOC0012B")

Output : "OC0012B"

Any help appreciated.

Frank
  • 66,179
  • 8
  • 96
  • 180
Deepak
  • 55
  • 6
  • Fyi, gsub is for replacing multiple matches in a single string; probably safer to use sub as in the answer below. – Frank Sep 21 '18 at 18:47
  • Id try to use ifelse here. Something like ifelse(grepl(“[A-Z“], your_strings), your_string, gsub(“^0*“, ““, your_string)) – TinglTanglBob Sep 21 '18 at 18:51

1 Answers1

3

We can use positive lookahead. This regex only matches the leading zeros if they are followed by an "M". Since lookarounds are zero-length assertions, "M" is not part of the match:

sub("^0+(?=[A-Z])", "", c("00MOC0012B", "000001"), perl = TRUE)

# [1] "MOC0012B" "000001"
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Or (probably less efficient) match the whole string `sub("^0{2}([A-Z].*)$", "\\1", x)` – Frank Sep 21 '18 at 18:49
  • Thanks a lot avid_useR and Frank. Actually I forgot to mention that the it can be any character not just 'M' so I changed the solution given by @avid_R to sub("^0+(?=[A-z])", "", c("00MOC0012B", "000001"), perl = TRUE) and it worked. May I ask Frank what do we mean by "\\1". Many thanks in advance. – Deepak Sep 21 '18 at 19:33
  • @DEEPAKAGARWAL Edited my answer. `\\1` references the first "capture group". A capture group is a way to group character matches together using parentheses. In this case, there is only one capture group, `([A-Z].*)`, in the entire match, so `sub` replaces the entire match with whatever is matched by the first and only capture group. – acylam Sep 21 '18 at 19:40
  • Many thanks avid_userR, just one more question is it like we want to replace but want to keep the capture group because if I give "" instead of "\\1" it removes first character as well. – Deepak Sep 21 '18 at 19:48
  • @DEEPAKAGARWAL, Not quite, `^0{2}([A-Z].*)$` matches your entire string, and `([A-Z].*)` only matches the values _without_ leading zeros, so if you write `""`, the entire match will be removed, versus `\\1`, where the entire string with be replaced with whatever is matched by `([A-Z].*)` – acylam Sep 21 '18 at 19:52
  • @DEEPAKAGARWAL Do not use `"^0+(?=[A-z])"` as [`[A-z]` matches chars other than letters, too](https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret/29771926#29771926). Use `sub("^0+(?=[A-Z])", "", c("00MOC0012B", "000001"), perl = TRUE, ignore.case=TRUE)` or `sub("^0+(?=[A-Za-z])", "", c("00MOC0012B", "000001"), perl = TRUE)`., or `sub("(?i)^0+(?=[A-Z])", "", c("00MOC0012B", "000001"), perl = TRUE)`, or `sub("^0+(?=[[:alpha:]])", "", c("00MOC0012B", "000001"), perl = TRUE)` – Wiktor Stribiżew Sep 21 '18 at 22:05
  • Thanks a lot avid_UseR and Wiktor. I highly appreciate giving time and providing me valuable learning. – Deepak Sep 23 '18 at 01:38