4

From this string :

gs://analytics_models_from_g_rdd_ga/9g653798d-7e13-4238-55443-6db72ace9911$$$$events_today.csv

I'm trying to extract this text :

9g653798d-7e13-4238-55443-6db72ace9911

and in separate REGEX function events_today

The rolls for the first string:

  1. The string will always end with $$$$.
  2. The string will always start with ga/.
  3. The extracted text length can be changed.

I sew several tutorials and red regex references but still not succeed I thought the solution will be /\\s*(.*?)\\s*$$$$ but is not.

In general, to extract a string between 2 strings

idan
  • 1,508
  • 5
  • 29
  • 60
  • Dollar signs have special meaning,; they are called anchors. You need to escape them like this `\\$\\$\\$\\$` – MonkeyZeus Nov 30 '20 at 20:16

2 Answers2

2

In general, to match any string between two specific chars, you can use

REGEXP_EXTRACT(col, r"<CHAR>(.*?)<CHAR>")

To match between / and $, you can use

REGEXP_EXTRACT(col, r"/(.*?)\$")

Also, see the list of chars that must be escaped in a regex at What special characters must be escaped in regular expressions?

Here, you can use

REGEXP_REPLACE(col, r".*ga/([^/$]*)\${4}.*", r"\1")

See the regex demo.

The identical, but a bit simplified since .* will become redundant, pattern can be used in REGEGEXP_EXTRACT:

REGEXP_EXTRACT(col, r"ga/([^/$]*)\${4}")

Pattern details:

  • .* - any zero or more chars other than line break chars, as many as possible
  • ga/ - ga and a slash
  • ([^/$]*) - Group 1: any zero or more chars other than / and $
  • \${4} - four $ symbols
  • .* - any zero or more chars other than line break chars, as many as possible.

The replacement is the Group 1 backreference, \1, that replaces the whole match with the contents of the group.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • And for the second string `REGEXP_REPLACE(_FILE_NAME, r".*ga/([^/$]*)\${4}(.*)\.csv", r"\2")` THANKS – idan Nov 30 '20 at 20:55
0

for BigQuery Standard SQL

select regexp_extract(txt, r'ga/(.*?)\${4}')    

if to apply to sample from your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230