-1

this is my first question (i'm still learning R) , i apologize in advance if the question is too stupid.
I'm trying to figure out how to make a regex that catches the first string, but not the second one.

strings <- c("p1_32_XYX_cancer_1", "p1_32_XYX_cancer_ttt_1")

I tested on regex101 and the best that i came up with is this (it works on regex101). However, when i try to input it in R, it comes up with the following error:

"(^p5[0-9].*XYX.*cancer)(?!.*ttt)"
Error in grep(needle, haystack, ...) : invalid regular expression 'mz|(^p5[0-9].*XYX.*cancer)(?!.*ttt)', reason 'Invalid regexp'

sorry for being unclear earlier, the exact code is :

ctc_gastric_df <- select(m,matches("mz|(^p5[0-9].*XYX.*cancer)(?!.*ttt)"))

Ahmed Ali
  • 19
  • 4

1 Answers1

0

We need perl = TRUE to make the regex in the OP's code to work without the error

grep("(^p5[0-9].*XYX.*cancer)(?!.*ttt)", strings, perl = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks alot, sorry about the post being a duplicate, i searched high and low for an answer but i used the wrong terminology. – Ahmed Ali Feb 21 '18 at 09:14