-2

I have the below string that I need help pulling an ID from in Presto. Presto uses the javascript regex. I've searched multiple options including:

JavaScript text between double quotes

Javascript regex to extract all characters between quotation marks following a specific word

I need to pull the GA Client ID which looks like this: 75714ae471df63202106404675dasd800097erer1849995367

Below is a snipped where it sits in the string.

The struggle is that the "s:38:" is not constant. The number can be anything. For example, it could be s:40: or s:1000: etc. I need it to return just the alphanumeric id.

String Snippet

"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";

Full string listed below

99524";s:9:"FirstName";s:2:"John";s:8:"LastName";s:8:"Doe";s:7:"Company";s:10:"Sample";s:5:"Email";s:20:"xxxxx@gmail.com";s:5:"Phone";s:10:"8888888888";s:7:"Country";s:13:"United States";s:5:"Title";s:8:"Creative";s:5:"State";s:2:"NC";s:13:"Last_Asset__c";s:40:"White Paper: Be a More Strategic Partner";s:16:"Last_Campaign__c";s:18:"70160000000q6TgAAI";s:16:"Referring_URL__c";s:8:"[direct]";s:19:"leadPriorityMarketo";s:2:"P2";s:18:"ProductInterest__c";s:9:"sample";s:14:"landingpageurl";s:359:"https://www.sample.com;mkt_tok=samplesamplesamplesample";s:14:"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";s:13:"Drupal_SID__c";s:36:"e1380c07-0258-47de-aaf8-82d4d8061e1a";s:4:"form";s:4:"1046";} ```


Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
cauthon
  • 161
  • 1
  • 10

1 Answers1

-1

This works for your sample

"GA_ClientID__c";[^"]*"([^"]*)"

https://regex101.com/r/Q4Orj6/1

  • This pulls in the entire string starting from GA_ClientID like below: "GA_ClientID__c";s:38:"18987900507222330191204445318077206127" "GA_ClientID__c";s:38:"30056129415927066493015279932494484849" "GA_ClientID__c";s:38:"45334344195681961193291818769152505335" "GA_ClientID__c";s:38:"42976399188084801382077697423679114357" – cauthon Sep 06 '19 at 16:39
  • I can see it working in the tester but when I run it, its still returning the full string. Here's the documentation for Presto's regex implementation: https://prestodb.github.io/docs/current/functions/regexp.html Let me see if I can figure out what they changed – cauthon Sep 06 '19 at 16:47
  • @cauthon - Presto -> `All of the regular expression functions use the Java pattern syntax` -> `"\"GA_ClientID__c\";[^\"]*\"([^\"]*)\""` ? –  Sep 06 '19 at 16:50
  • Your answer is correct. I was missing the group flag in the SQL. regexp_extract(a.attributes_value, 'GA_ClientID__c\";[^\"]*\"([^\"]*)',1) Thank you!!! – cauthon Sep 06 '19 at 17:28
  • @cauthon - Even though it might work this way `GA_ClientID__c";[^"]*"([^"]*)` it's probably better to keep the begin and end closure's either `"\"GA_ClientID__c\";[^\"]*\"([^\"]*)\""` or `'\"GA_ClientID__c\";[^\"]*\"([^\"]*)\"'` –  Sep 06 '19 at 17:42