3

Similarly to this question...

How can I use regex to split a string, using a string as a delimiter?

... I'm trying to split the following string:

Spent 30 CAD in movie tickets at Cineplex on 2018-06-01

My desired output is this:

ELEMENT ELEMENT_VALUE
------- -------------
      1 Spent
      2 30
      3 CAD
      4 movie tickets
      5 Cineplex
      6 2018-06-01

Similarly, it should be able to process:

Paid 600 EUR to Electric Company

Producing:

ELEMENT ELEMENT_VALUE
------- -------------
      1 Paid
      2 600
      3 EUR
      4 
      5 Electric Company

I've tried this regular expression to no avail:

(\w+)(\D+)(\w+)(?(?=in)(\w+)(at)(\w+)(on)(.?$)|((?=to)(\w+)(.?$)))

I've looked on a couple of regular expression websites plus this post without much luck:

Extract some part of text separated by delimiter using regex

Could someone please help?

William Robertson
  • 15,273
  • 4
  • 38
  • 44
Jaquio
  • 183
  • 1
  • 11
  • Oracle does not support non-capturing groups or look-ahead in regular expressions. – MT0 May 31 '18 at 22:56

2 Answers2

4

Here's a simple SQL tokenizer that breaks on a space:

select regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01','[^ ]+', 1, level) from dual
connect by regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01', '[^ ]+', 1, level) is not null

From: https://blogs.oracle.com/aramamoo/how-to-split-comma-separated-string-and-pass-to-in-clause-of-select-statement

eaolson
  • 14,717
  • 7
  • 43
  • 58
0

There are two problems with your required output. The first is how to define the tokens you want to exclude ('on', 'at', etc). The second is how to ignore the space in certain tokens ('Electric Company', 'movie tickets').

It's easy enough to solve point one with a two-step process. Step #1 splits the string on spaces, step #2 removes the unwanted tokens:

with exclude as (
  select 'in' as tkn from dual union all
  select 'at' as tkn from dual union all
  select 'to' as tkn from dual union all
  select 'on' as tkn from dual 
  )
  , str as (
    select id
           , level as element_order
           , regexp_substr(txt, '[^ ]+', 1, level) as tkn
    from t23
    where id = 10
    CONNECT BY level <= regexp_count(txt, '[^ ]+')+1
    and id = prior id
    and prior sys_guid() is not null
    )
 select row_number() over (partition by str.id order by str.element_order) as element
       , str.tkn as element_value
 from str
      left join exclude on exclude.tkn = str.tkn
 where exclude.tkn is null
 and str.tkn is not null
 ;

Here is a SQL Fiddle demo.

The second point is pretty hard to solve. I guess you'll need another look-up table to identify the ringers, and maybe use listagg() to concatenate them.

APC
  • 144,005
  • 19
  • 170
  • 281