I want to convert text like this using Snowflake SQL:
A very(!) punctuated sentence! A complicated/confusing regex.
to this:
'A very ( ! ) punctuated sentence ! A complicated / confusing regex . '
Double spaces between punctuation is ok because I can do a second pass to compress whitespace. The punctuation list is
.,&-_()[]{}-/:;%$@!*|?~=+"\'
But if there is a standard shortcut for all punctuation I would consider that. I have seen answers that use Java regex that uses \p{Punct}
. But in my tests I can't use the punctuation identifier and don't see it in the Snowflake docs.
I have a working version that makes my eyes bleed and it's not even fully written out:
select regexp_replace(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'(\\(|\\)|\\/|!|\\?)', -- only addresses (), /, !, ?, not the full list
' \\1 '
) as "result" from table
result: "a very ( ! ) punctuated sentence ! A complicated / confusing regex ? ."
For some reason there are not double spaces, which makes me question the result as well as the readability of the implementation
My understanding is that character classes are more performant and simpler to visually parse. But this doesn't work:
select regexp_replace(
'a very(!) punctuated sentence! A complicated/confusing regex?.',
'[.,&-_()[]{}-/:;%$@!*|?~=+"\'`]',
' \\1 '
) as "result" from table
-- Error: no argument for repetition operator: ?
It also doesn't seem that back references are available to character classes.
Is there a way to write this query that is relatively performant and allows the reader to easily visually parse the punctuation list such as in the character classes above?