1

I need to remove values matching "CBII_*_*_" with "MAP_" in vector tt below.

tt <- c("CBII_27_1018_62770", "CBII_2733_101448_6272", "MAP_1222")

I tried

gsub("CBII_*_*", "MAP_") which won't give the expected result. What would be the solution for this so I get:

"MAP_62770", "MAP_6272", "MAP_1222"

Yamuna_dhungana
  • 653
  • 4
  • 10

3 Answers3

2

You can use:

gsub("^CBII_.*_.*_", "MAP_",tt)

or

stringr::str_replace(tt, "^CBII_.*_.*_", "MAP_")

Output

[1] "MAP_62770" "MAP_6272"  "MAP_1222"
slava-kohut
  • 4,203
  • 1
  • 7
  • 24
1

An option with trimws from base R along with paste. We specify the whitespace as characters (.*) till the _. Thus, it removes the substring till the last _ and then with paste concatenate a new string ("MAP_")

paste0("MAP_", trimws(tt, whitespace = ".*_"))
#[1] "MAP_62770" "MAP_6272"  "MAP_1222" 
akrun
  • 874,273
  • 37
  • 540
  • 662
0
sub(".*(?<=_)(\\d+)$", "MAP_\\1", tt, perl = T)
[1] "MAP_62770" "MAP_6272"  "MAP_1222"

Here we use positive lookbehind to assert that there is an underscore _ on the left of the capturing group (\\d+) at the very end of the string ($); we recall that capturing group with \\1 in the replacement argument to sub and move MAP_in front of it.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34