0

I'm dynamically creating a string that will be used as a regular expression pattern. I'm creating it and using it in Snowflake SQL. There are reserved regular expression characters in it that I want to keep as part of the original text. For example:

'word1, word2, a.b.c, hot/cool, | general'

I'm going to convert those commas to a | so that we search through text and get a positive match if any of them are in the text. The | general may also be legitimately in the text so need to escape that. The ., / and many other reserved characters are in the text. Basically, I need to escape them all. I'm doing this transform in separate steps, so I can convert the commas to pipes after this escaping step.

This is the simplest test case and solution I can come up with:

select regexp_replace(
'+ . * ? ^ $ , [ ] { } ( ) | /',  -- text to escape
'\\+|\\.|\\*|\\?|\\^|\\$|\\,|\\[|\\]|\\{|\\}|\\(|\\)|\\||\\/',  -- pattern
'\\\\$0'  -- replace captured text with \\ in front of it
)

Even in this case I'm missing the \ in the original text to escape because it throws an error. The result of this is:

\$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0 \$0

I've tried many variations of backslashes before the $0 and nothing works.

Python has a re.escape() function. Javascript has ways of doing it (https://stackoverflow.com/a/3561711/1884101). I can't figure out any way to do this in Snowflake other than a UDF, which I would really like to avoid. Someone else tried my example above in Postgres and it worked.

Is there a way to do this in Snowflake SQL (including escaping the \)?

Jeff
  • 939
  • 10
  • 19

2 Answers2

1

Obviously, regexp_replace function has a bit different features on various databases. On snowflake, it works well

select REGEXP_REPLACE(
'+ . * ? ^ $ , [ ] { } ( ) | /',
'\\+|\\.|\\*|\\?|\\^|\\$|\\,|\\[|\\]|\\{|\\}|\\(|\\)|\\||\\/', -- escaped reserved characters
'\\\\\\\\\\0' -- I want to add \\ in front of every reserved character
)

4 backslashes could replace to one, that's why you wanna add two backslashes, 4*2 = 8 (backslashes).plus for getting matched patterns \\0

ArthurV
  • 26
  • 2
  • This works great. Not sure how to escape \ itself but not sure if it's possible. For my purposes, I can easily strip them out first.. – Jeff Aug 18 '20 at 14:13
0

Can you try this

select regexp_replace(
'+ . * ? ^ $ , [ ] { } ( ) | /',  -- text to escape
'\\+|\\.|\\*|\\?|\\^|\\$|\\,|\\[|\\]|\\{|\\}|\\(|\\)|\\||\\/',  -- pattern
'\\\\\\\\$0'  -- replace captured text with \\ in front of it
)
Rajib Deb
  • 1,496
  • 11
  • 30
  • I got this: `\\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0 \\$0` – Jeff Aug 17 '20 at 20:00