I'm writing code for a simple text miner that encapsulates a few regex's and whos goal is to essentially tick off a certain indicator if that regex is identified (simple enough). Problem is a few of these regex's include special characters ([,],/) that have to be escaped and therefore require a double backslash infront of them to work (I'm using impala as my db, not sure if regex syntax vary between db's). For example (from my R code):
mutate(indicator = case_when(
sql("(body) RLIKE '(confirmation[ :\\[\\]review]*n[^\\/y])'") ~ "NULL",
sql("(body) RLIKE '(confirmation[ :\\[\\]review]*y[^\\/n])|(other similar regex)'") ~ "Y", ...,
TRUE ~ "NULL")) %>% ....
The framework of what I need is the word "confirmation", followed by any of the characters ( ,:,[,],r,e,v,i,e,w), preceded by the letter "y", and not followed by (/,n). This regex works for what I need (if I query in sql). The problem arises when I try the same query in R using dbplyr (code snipet shown below), it seemingly doesn't identify the same substrings I just saw it do in sql. Investigating further (using the show_query argument), I noticed that the dbplyr query removes by default one of the \ in my escape argument:
....
CASE
WHEN ((body) RLIKE '(confirmation[ :\[\]review]*n[^\/n])') THEN 'NULL'
WHEN ((body) RLIKE '(confirmation[ :\[\]review]*y[^\/n])|(more similar expressions) THEN 'Y'
WHEN .......
ELSE 'NULL'
END AS `coverage_indicator`, ....
I'm not an expert in regex so I don't know the consequences of this but there are some backslashes in the text I'm mining that cannot be part of those unallowed characters (they're inevitably there). Are there any ways to avoid this disappearing \ or workarounds that allow me to achieve what I need out of the regex?
Note: The entire query is quite long and there is are lot of parts unimportant to this question, If need be I can include more.