What is the significance of 'E' in regex_replace?

Question

In our project, we move the data from tables on RDBMS to HDFS using Scala and Spark. Before moving the data, we apply a "regex_replace" on the data to eliminate some discrepancies in the data. Below is the regex_replace:

regexp_replace(
    regexp_replace(
        regexp_replace(
            regexp_replace(
                regexp_replace(..., E'[\\n]+', ' ', 'g' ),
                E'[\\r]+', ' ', 'g'
            ),
            E'[\\t]+', ' ', 'g'
        ),
        E'[\\cA]+', ' ', 'g'
    ),
    E'[\\ca]+', ' ', 'g'
)

What is the meaning of the E that preceeds the single quoted strings in each regexp_replace call?

What kind of regex flavour/dialect or language/environment/database are you talking about exactly?! — deceze, Aug 21 '18 at 16:06
I have a hunch that he's using Java. With a string for a PostgreSQL statement. If so, that `E` is probably to make it verbatim string. So that `E` is not even part of a regex. [example SO post](https://stackoverflow.com/a/938/4003419). — LukStorms, Aug 21 '18 at 19:08
@LukStorms I am trying to use this regex pattern in the language Scala. — Bobby, Aug 22 '18 at 18:10
I don't know Scala, nor Apache Spark. But it seems that all those (SQL for a PostgreSql database?) `replace_regexp` can be replaced by a single one. F.e.: `regexp_replace(..., E'[\\cA\\ca\\t\\r\\n]+',' ','g')` But then again, that might result in less spaces. — LukStorms, Aug 22 '18 at 21:56
Then the E is to make it an [escape string constant](https://stackoverflow.com/questions/34823158/whats-the-e-before-a-postgres-string) — LukStorms, Aug 23 '18 at 06:26

LukStorms · Accepted Answer · 2018-08-23T08:29:10.580

The 'E' is part of the SQL syntax, at least for PostgreSql.
It's used in the SQL to be able to escape characters with a backslash \.

And if such SQL string is put in a scala string (that isn't a raw string), then the backslashes need to be backslashed to get literal backslashes.

So to make the regex in the regex_replace function to see only one literal backslash?

val sqlstring = "select regexp_replace(col1, E'\\\\s+','','g') as col1_without_whitespace from yourtable"

Is equivalent to:

val sqlstring = "select regexp_replace(col1, '\\s+','','g') as col1_without_whitespace from yourtable"

Is equivalent to:

val sqlstring = raw"select regexp_replace(col1, '\s+','','g') as col1_without_whitespace from yourtable"

A SQL Fiddle test for that 'E' can be found here

What is the significance of 'E' in regex_replace?

1 Answers1