0

Say I have the following query:

Select 
 COLUMN as RAW_COLUMN
,REGEX_REPLACE(COLUMN,'REGEX_PATTERN_HERE','REPLACEMENT_STRING_HERE') as CLEAN_COLUMN
FROM TABLE

As you can see above I created a Javascript function that allows me to replace a column value using regex patterns. Below is the function:

CREATE OR REPLACE FUNCTION "REGEX_REPLACE"("SUBJECT" VARCHAR(16777216), "PATTERN" VARCHAR(16777216), "REPLACEMENT" VARCHAR(16777216))
RETURNS VARCHAR(16777216)
LANGUAGE JAVASCRIPT
AS '
  
    const p = SUBJECT;
    let regex = new RegExp(PATTERN, ''i'') 
    return p.replace(regex, REPLACEMENT);
  ';

I need to clean the data with several regex replacements. What I'm trying to do is create a dictionary that will hold the regex pattern and replacement string that will be used to loop through and execute the regex_replace() function. In other words, run REGEX_REPLACE() function inside the SQL query itself over and over again until it goes through the entire dictionary.

//Key:value rule for regex_pattern_dict={'THE PATTERN HERE':'WHAT YOU WANT TO REPLACE IT WITH'}

regex_pattern_dict ={
     '[0-9]{2}[\/\.][0-9]{2}':''                 -- remove dates i.e. "08/02"
    ,'[0-9]{3}[-][0-9]{3}[-][0-9]{4}':''         -- remove phone numbers i.e. "888-957-4675"
    ,'[+][0-9]{11}': ''                          -- remove phone numbers i.e. "+18882467822"
     
};

In other words, in the SQL query itself I'm trying to create a loop with the Javascript REGEX_REPLACE function and dictionary (dictionary is also in javascript).

I guess what I'm trying to do is something like this:


Select 
 COLUMN as RAW_COLUMN
--,REGEX_REPLACE(COLUMN,'REGEX_PATTERN_HERE','REPLACEMENT_STRING_HERE') as CLEAN_COLUMN
,for (const [PATTERN, REPLACEMENT] of Object.entries(regex_pattern_dict)) {
  REGEX_REPLACE(COLUMN,PATTERN,REPLACEMENT);
} AS CLEAN_COLUMN
FROM TABLE




Any ideas how to approach this? Not sure how to put a javascript loop in SQL.


With the help of @David Garrison the below code was the solution:



CREATE or replace PROCEDURE TABLE_CLEAN()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$

//Regex Cleaning Dictionary
const regex_pattern_dict ={

    "[0-9]{2}[\/\.][0-9]{2}":""                 // remove dates i.e. "08/02"
    ,"[0-9]{3}[-][0-9]{3}[-][0-9]{4}":""         // remove phone numbers i.e. "888-957-4675"
    ,"[+][0-9]{11}": ""                          // remove phone numbers i.e. "+18882467822"
};

//Column to Clean
var RegexStr = "COLUMN";

//Key = Regex Pattern, Value = Replacement
for (const [key, value] of Object.entries(regex_pattern_dict)) {
   RegexStr = "REXP_REPLACE_ME(" + RegexStr + ",'" + key + "','" + value + "')"

}


var rs = snowflake.execute( { sqlText:
`
  CREATE OR REPLACE VIEW  DATABASE.SCHEMA.TABLE AS
  SELECT 
    COLUMN
    , ${RegexStr} as clean_column

  FROM  DATABASE.SCHEMA.TABLE_ORIGINAL
`

                            } );

  $$;


  call TABLE_CLEAN();



select * from  DATABASE.SCHEMA.TABLE




Greg Pavlik
  • 10,089
  • 2
  • 12
  • 29
mikelowry
  • 1,307
  • 4
  • 21
  • 43
  • 1
    If you had the set of patterns in a table, you should be able to use a recursive CTE to build up the results while traversing the set – Simeon Pilgrim Oct 06 '21 at 22:38
  • 2
    Since you're using backticks, it can help ease writing and improve readability to use template literals instead of string concatenation. Instead of using + RegexStr + you can use ${RegexStr} and don't have to close and reopen the string using backticks. – Greg Pavlik Oct 07 '21 at 03:15
  • @GregPavlik so it would just be ```RegexStr = REXP_REPLACE_ME(${RegexStr},Key,Value)``` ? – mikelowry Oct 07 '21 at 16:16
  • 1
    Different line. It's difficult to put into the comments section and I don't want to add an answer. I'll just change the text in your question so you can see the usage of template literals. Refer to the line with `+ RegexStr + as clean_column` in it. It's in the final code block near the end. – Greg Pavlik Oct 07 '21 at 16:20

1 Answers1

1

updated with more correct syntax. Full solution is above in the edited question

var RegexStr = "COLUMN";

for (const [key, value] of Object.entries(regex_pattern_dict)) {
   RegexStr = "REXP_REPLACE(" + RegexStr + ",'" + key + "','" + value + "')"

}


sql = `select 
           raw_column,
           ` + RegexStr + `as clean_column
       from table`

sql_statement = snowflake.createStatement({sqlText: sql });
result_set = sql_statement.execute();

Note: This may require using a procedure instead of a function.

David Garrison
  • 2,546
  • 15
  • 25
  • oh, interesting, let me attempt to piece this together on my end. Really appreciate the guidance! – mikelowry Oct 06 '21 at 22:35
  • I wish I could get more syntactically correct with the answer, this is on the edge of my knowledge with both snowflake functions and javascript loops, but I think the concept of building out the sql dynamically is a reasonable approach. I'll be curious to hear if it works out, and what you come up with. – David Garrison Oct 06 '21 at 22:39
  • 1
    Tweaked a few things, but worked like a charm! Posted the answer above for reference. – mikelowry Oct 06 '21 at 23:22