How to bypass the one capturing group of regular expression on bigquery for multiple use-cases

Question

I'm using GBQ to try and create a regexp to extract just the "tn" value from the below URL examples. The "tn" key could be anywhere in the URL after the "event?" and could technically be any value as well. However, the regexp I'm using (shown below) utilizes two groups and GBQ only allows one so I'm looking for assistance to see how I can make it

I also tried splitting the below URL to try and extract the value from the array but couldn't figure out where to go from there. I feel like the splitting it may be easier, but I want to be sure it ends up not duplicating the data if I unnest it (ideally I wouldn't need to unnest it).

https://google.com/d/track/event?upid=12345&url=https://www.mywebsite.com/&tn=o_rnpp4126&val=100.00&cache_buster=15&ps=2

https://google.com/d/track/event?upid=12345&url=https://www.mywebsite.com/&val=100.00&cache_buster=15&ps=2&tn=o_rnpp4126

tn=(.+?)(&|$)

select
split(split('https://google.com/d/track/event?upid=12345&url=https://www.mywebsite.com/&val=100.00&cache_buster=15&ps=2&tn=o_rnpp4126','event?')[ordinal(2)],'&')

You can simply use `tn=([^&]+)`. Also use `\btn=([^&]+)` if it allowed in big query, to avoid matching something like `smtn=value`. — markalex, Apr 12 '23 at 16:40

score 1 · Accepted Answer · answered Apr 12 '23 at 18:32

To match everything till the next & or end of line without second group, (and I assume without lookarounds) you can simply use tn=([^&]+).

It will automatically stop when encounters &.

Also, to avoid possibility of matching something like smtn=value, you could use \btn=([^&]+). It uses word boundary \b to exclude matching of word ending with tn.

Alternatively you could use non-capturing group: tn=(.+?)(?:&|$)

How to bypass the one capturing group of regular expression on bigquery for multiple use-cases

1 Answers1