I am trying to parse a set of strings. I need to find out whether 'bcl-2' was detected in the sample. For example: "bl-2 was detected in 45% of patients". However there are certain possible variation which are challenging:
1."bcl-2 was detected in 45% bcl-6 was not detected"
2. "bcl-2 was not detected bcl-6 was detected in 45%"
3. "no evidendce of bcl-2 bcl-6 was detected in 45%"
So I am trying to define the regex code that would:
1. Lookahead for 'bcl-2'
2. Then, lookahead from that point for 'detected'
3. Then lookbehind between 'bcl-2' and 'detected' to make sure there is no 'not'.
4. If possible lookbehind 'bcl-2' to make sure there is 'no evidence of' (though I can take care of this condition separately)
I tried the following code which doesn't work. Specifically it doesn't lookbehind, so I am guessing there is something inherent to lookbehind that I am missing.
This regex works for "bcl-2 was not detected" but fails for "bcl-2 was detected in 45% bcl-6 was not detected"
y="bcl-2 was detected in 45% bcl-6 was not detected"
grepl("(?=bcl-?2)(?!.*not)(?=.*detected)",y, ignore.case = T,perl=T)
So I thought this will work but it doesn't:
grepl("(?=bcl-?2)(?=.*detected)(?<!not)",y, ignore.case = T,perl=T)
I am trying to understand the logic of lookbehind. In regards to the last line of code -> I thought (?=bcl-?2) looks forward until the point in the string that begins with 'bcl-2'. Then, I thought the (?=.*detected) looks forward until the position in the string where 'detected' start. Then I thought lookbehind starts looking backwards from that position for 'not'. This is of course wrong ... so what am I missing about the lookaround logic
BTW a great website I have been using in an attempt to figure this out https://www.regular-expressions.info/recurse.html