1

I previously had help here for an Regexp expression in oracle sql which worked great.However, our place is converting to Big Query and the regexp does not seem to be working anymore.

In my tables, i have the following data

WC 12/10 change FC from 24 to 32
W/C 12/10 change fc from 401 to 340
W/C12/10 18-26

This oracle sql would have split the table up to give me the before number (24) and (32) and (12/10).

cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i',2) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i', 5) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((\d+)(\/|-|.| )(\d+)(\/|-|.| )(\d+))|(\d+)(\/|-|.| )(\d+)', 1, 1, 'i') as WC_Date,

Totally understand that comments are not consistent and may not work but if it works more than 80% of the time which it has then we are fine with this.

Since moving to big query, I'm getting this error message. In oracle, the tables were in varchar but in big query when they migrated it, its now in strings. Could this be the reason why its broken?Is there anyone who can help with this?This is way over my head.

No matching signature for function REGEXP_SUBSTR for argument types: STRING, STRING, INT64, INT64, STRING, INT64. Supported signatures: REGEXP_SUBSTR(STRING, STRING, [INT64], [INT64]); REGEXP_SUBSTR(BYTES, BYTES, [INT64], [INT64]) at [69:12]

Nick
  • 138,499
  • 22
  • 57
  • 95
  • i suggest you to just present sample of your input data, desired output and logic - so we will be able to better help you! – Mikhail Berlyant Oct 05 '22 at 21:28
  • @MikhailBerlyant hi there, this is what i would like to see. Thanking you in advance. https://i.postimg.cc/527Qgg33/SQl.png – Jonathon Chau Oct 05 '22 at 21:43
  • Did one of these answers solve your problem? If not, could you provide more information to help answer it? – Nick Oct 12 '22 at 06:54

3 Answers3

0

Since google bigquery REGEXP_SUBSTR doesn't support the subexpr parameter of Oracle's REGEXP_SUBSTR, you need to modify your regexes to take advantage of the fact that:

If the regular expression contains a capturing group, the function returns the substring that is matched by that capturing group.

So for each value you are trying to extract, you need to make that the only capturing group in the regex:

cast(REGEXP_SUBSTR(Line_Comment, '(?:(\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(?:\d+))', 1, 1) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '(?:(?:\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(\d+))', 1, 1) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((?:\d+)(?:\/|-|.| )(?:\d+)(?:\/|-|.| )(?:\d+))|((?:\d+)(?:\/|-|.| )(?:\d+))', 1, 1) as WC_Date,

Note you can substantially simplify your regexes as below:

(\d+) ?(?:change )?(?:-|too?) ?(?:\d+)
(?:\d+) ?(?:change )?(?:-|too?) ?(\d+)
(?:\d+)(?:[\/.-](?:\d+)){1,2}

Regex demos on regex101: numbers, date

Nick
  • 138,499
  • 22
  • 57
  • 95
0

Based on the sample data you provided in the comment section, you can try below query:

with t1 as (
  select 'WC 12/10 change FC from 24 to 32' as Comment
  union all select 'W/C 12/10 change fc from 401 to 340' as Comment,
  union all select 'W/C12/10 18-26' as Comment
)

select Comment,
regexp_extract(t1.Comment, r'(\d+\/\d+)') as WC,
regexp_extract(t1.Comment, r'.+\s(\d{1,3})[\s|\-]') as Before,
regexp_extract(t1.Comment, r'.+[\sto\s|\-](\d{1,3})$') as After
from t1

Output:

enter image description here

Anjela B
  • 1,150
  • 1
  • 2
  • 7
0

Consider below super simple approach

select Comment, 
  format('%s/%s', arr[offset(0)], arr[safe_offset(1)]) as wc,
  arr[safe_offset(2)] as before,
  arr[safe_offset(3)] as after
from your_table, unnest([struct(regexp_extract_all(Comment, r'\d+') as arr)]) 

if applied to sample data in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230