I'm trying to learn regex to use it in r.
Currently I'm just testing a few text substitution operations, and I looked at some example on the internet. Then I tried out the operations below:
Making a list of some random words to test out regex operations
mylist <- c("Calendar", "Vinegar", "Character", "Boiler", "Conductor", "Franchisor")
Trying to match the "or" in those words and replace them with "ee" - using the matching expression "^([a-zA-Z]*)or", and replacing the matched result with "\1ee", but it doesn't work:
sub("^([a-zA-Z]*)or","\1ee", mylist) [1] "Calendar" "Vinegar" "Character" "Boiler" "\001ee" "\001ee"
Trying to match the "or" in those words and replace them with "ee" - using the matching expression "^([a-zA-Z]*)or", and replacing the matched result with "\1ee", that gives the expected result:
sub("^([a-zA-Z]*)or","\1ee", mylist) [1] "Calendar" "Vinegar" "Character" "Boiler" "Conductee" "Franchisee"
My question is why do we have to use "\1" to get backreferencing to work correctly? Isn't backreferencing in regex is normally called with a single slash "\" rather than a "\"?
I sort of guess from reading some sample codes and examples on the internet that in r when you want to use the slash "\" character, you have to specify it as "\". Is that a right application / interpretation in this case?
But doesn't r already recognise "\n" and "\t" as special escaped characters? we can use them straight in a string without any issue, so why not "\1"?
Does that have anything to do with the fact that "^([a-zA-Z]*)or" and "\1ee" are specified as 2 separate arguments of the function sub? How is the function sub specified in r?
Also, a call to:
sub("^([a-zA-Z]*)or","\1ee", mylist)
produces
[1] "Calendar" "Vinegar" "Character" "Boiler" "\001ee" "\001ee"
How come it produces that "\001ee"? Why did "\1" come out as "\001" if r was treating it as a straight text expression? Does "\1" have any special meaning in r?
[Edit] Thanks Wiktor for explaining the requirement for the literal "\". But can anyone please also explain the other questions in my post? That's why it not an exact duplicate of the "how-to-escape-backslashes-in-r-string" topic.