0

In SQL (Bigquery).

I want to use regex_extract function and extract the expression between the 4th and 5th "_"

Example: From this expression:

Bubble_ALLTIERS_30-65+_M_2%Payers100-500_AEO7D_29_1_EN

I want to extract the phrase 2%Payers100-500

Thank you.

Chris
  • 2,254
  • 8
  • 22
ornachshon
  • 95
  • 7
  • Show what you have tried as a [mcve]. For reference there is https://regex101.com/ You may also have to [edit] this question and explain the limits and requirements. Are there always the same number of underscores? How would a human pick out the part you want? If you can't describe the general pattern to a human, you can't to a computer. –  Feb 04 '19 at 15:57

2 Answers2

0

Use

^(?:[^_]*_){4}([^_]+)

See a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

Below is for BigQuery Standard SQL and looks to me an easiest way to do such extraction

SPLIT(phrase , '_')[SAFE_OFFSET(4)] 

You can test it as in example below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Bubble_ALLTIERS_30-65+_M_2%Payers100-500_AEO7D_29_1_EN' phrase 
)
SELECT SPLIT(phrase , '_')[SAFE_OFFSET(4)] word 
FROM `project.dataset.table`

with result

Row word     
1   2%Payers100-500  

Another option would be

REGEXP_EXTRACT_ALL(phrase, r'[^_]+')[SAFE_OFFSET(4)]
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230