0

I have a vector of strings

ids <- c("NM_006690.2_PROBE1","333212.1_PROBE1","7602049CB1_PROBE1","NM_018065.1_PROBE1","1539036CB1_PROBE1","NM_021019.1_PROBE1","1440608CB1_PROBE1","NM_031270.1_PROBE1","613678CB1_PROBE1")

A lot of discussing is already here: extract a substring in R according to a pattern.

I want to remove everything after a dot(.) and want to remove all after _ before PROBE. i managed to remove . by

read.table(text = ids, sep = ".", as.is = TRUE, fill=TRUE)$V1

I now mind to remove the _ before PROBE in cases like 613678CB1_PROBE1 and the output i want is 613678CB1 . How to do it.

Output:

"NM_006690", "333212"  , "7602049CB1"  "NM_018065","1539036CB1"  "NM_021019" "1440608CB1"  "NM_031270","613678CB1")

Note: There are two _'s one attached with NM and other with PROBE. I want the one every thing to be removed _PROBE

Community
  • 1
  • 1
Hashim
  • 307
  • 1
  • 5
  • 16

2 Answers2

6

It seems like you're asking for:

gsub("\\..*|_PROBE.*", "", ids)

Demo:

gsub("\\..*|_PROBE.*", "", ids)
# [1] "NM_006690"  "333212"     "7602049CB1" "NM_018065"  "1539036CB1"
# [6] "NM_021019"  "1440608CB1" "NM_031270"  "613678CB1" 
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
2

do you just want this?

ids <- c("NM_006690.2_PROBE1", "333212.1_PROBE1"  , "7602049CB1_PROBE1" , "NM_018065.1_PROBE1",
         "1539036CB1_PROBE1",  "NM_021019.1_PROBE1", "1440608CB1_PROBE1",  "NM_031270.1_PROBE1",
         "613678CB1_PROBE1")
ids <- read.table(text = ids, sep = ".", as.is = TRUE, fill=TRUE)$V1

library(stringr)
ids <- str_replace(ids, "_PROBE1", "")

which gives you this:

"NM_006690"  "333212"     "7602049CB1" "NM_018065"  "1539036CB1" "NM_021019"  "1440608CB1" "NM_031270"  "613678CB1"  
grrgrrbla
  • 2,529
  • 2
  • 16
  • 29