I was working on the refinement of this answer; and figured out that the regex given below is not working properly(as per its meaning) in R
.
+?on.*$
According to my understanding of regex, the above regex matches:
lazily space one or more times followed by
on
followed by anything(except newline) till the end.
INPUT:
Posted by ondrej on 29 Feb 2020.
Posted by ona'je on 29 Feb 2020.
OUTPUT (according to me, if above regex pattern in test string is replaced by "")
Posted by
Posted by
And when I'm trying to test it in python (implementation here), javascript and java (implementation here); I'm getting the result as I expected.
const myString = "Posted by ondrej on 29 Feb 2020.\nPosted by ona'je on";
console.log(myString.replace( new RegExp(" +?on.*$","gm"),""));
On the other hand, if I'm trying to implement the same regex in R (implementation here); I'm getting the result as
Posted by ondrej
Posted by ona'je
and this is unexpected.
Doubt
I thought that maybe regex parser for R
works differently(perhaps from right to left). I read the documentation of how regex work in R
but found nothing different from other languages for the above regex. I may be missing something here. I am not well-versed with R
but as far as my regex knowledge; I believe that the above regex should work as it works in java
, javascript
and python
(may be in pcre
too.) for every standard regex engines(as far as I know). My question is why the above regex is working differently in R
?