1

I have a data wherein one of the variable has a non-uniform pattern/format and I need to write a code in R which can remove that part of the string in the variable which follows a specific pattern.

There are links on replacement of patterns such as Extract a string between patterns/delimiters in R, Replace patterns separated by delimiter in R, and Remove part of a string but they haven't discussed the issue related to my data.

This is how the variable (c) looks like and below are the options I tried along with their results.

c <-  c("1998/123; 2001","181;2002/12","212")
c1 <- gsub("[0-9]/[0-9]", "", c) # returns 19923;2001, 181;2002, 212
c2 <- gsub("[0-9]/*", "", c) # returns ";",  " ", ";", ""
c3 <- gsub("[0-9][0-9][0-9][0-9]/", "", c) # returns 123;2001, 181;12, 212 
c4 <- gsub("*[0-9]/[0-9]*", "", c) # returns 199, 200, 212 
c5 <- gsub(" */* ", "", c) # no change
c6 <- str_replace_all(c,"/","") # returns 1998123, 200212, 212
c7 <- grep(fixed("/"), c, invert=TRUE, value = TRUE) # returns 212

a) There can be 3-8 digits after the forward slash. But there can only be 4 digits before the forward slash.

b) Each sub-string is separated by a semicolon delimiter.

c) I want to replace those substrings that contain the forward slash with blank. So, my result should be c(";2001", "181;" ,"212").

Kindly let me know where am I making the mistake. Any suggestions are very much welcome. Thanks.

1 Answers1

2

As the numbers before and after the forward slash have multiple digits you could use + (1 or more) or * (0 and more) in your first approach to remove all of them:

c <-  c("1998/123; 2001","181;2002/12","212")

gsub("\\d+\\/\\d+", "", c)
#> [1] "; 2001" "181;"   "212"
stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks a ton! Yes, it is working fine now. Suppose, if the forward slash was followed by alphanumeric digits instead of numbers, how would we tweak the code? – PhD Student Jun 26 '22 at 21:19
  • In that case use `gsub("[[:alnum:]]+/[[:alnum:]]+", "", c)`. See e.g. `?base::regex`. – stefan Jun 26 '22 at 21:22
  • @PhDStudent Please consider closing the question by setting the green check mark next to the answer. This will help future SO users identify relevant questions & answers. I recommend doing the same for other questions you have asked & have got an answer for. – Maurits Evers Jun 26 '22 at 23:00
  • @MauritsEvers - Thank you for letting me know. I tried clicking on the up arrow "this answer is useful" but could never do so as we need at least 15 reputation to cast a vote. Will do the green check for my other questions, too! – PhD Student Jun 26 '22 at 23:09