2

I'm working with some data which is stored in a VARCHAR2(4000) column and the data is mainly text comments but also contains dates within the text. I've written a query which uses SUBSTR and INSTR to pattern match and find a leading Text value prior to the date and then SUBST to return the date value which I then convert to a date using TO_DATE. This works well however I have a number of records that contain multiple comments and as such multiple dates. Using the above method I'm only able to specify the (n)th occurrence of this pattern, is there a way to return ALL dates when matched rather than only a single occurrence?

Here is an example of the data within the varchar column;

LOCKED ENTITY: ACCOUNT
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:47
CUST NOTES: <Please enter explanation here>
Customer notes are entered here.

UNLOCKED ENTITY: ACCOUNT
UNLOCKED BY USER: ops
UNLOCKED AT: 31/05/2004 11:49
UNLOCK NOTES: <Please enter explanation here>
Test

LOCKED ENTITY: USER
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:50
LOCK NOTES: <Please enter explanation here>
Test

UNLOCKED ENTITY: USER
UNLOCKED BY USER: ops
UNLOCKED AT: 24/08/2009 16:47
UNLOCKED NOTES: <Please enter explanation here>

Here's a simplified version of the query I'm using (All other irrelevant joins and columns removed for clarity);

select substr(VALUE, INSTR(VALUE,'LOCKED AT: ',1)+11, 10) as "DATE"
  from tableA a
  join tableB b
    on a.id = b.id
 where regexp_like (VALUE ,'ABC|DEF|GHI');  

DATE
----------
31/05/2004

For the above Query I'd like to return all dates that follow the string 'LOCKED AT: ' eg;

 DATE
----------
31/05/2004
31/05/2004

Any help would be appreciated. For info the DB version is Oracle 10g, I attempted to use REGEXP_COUNT to achieve something I had in my mind but only works with 11g onward.

Many Thanks

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
user7329642
  • 67
  • 1
  • 6

3 Answers3

3

You can use regexp_substr with connect by level to get all matches:

Providing Sample Data:

WITH tableA(VALUE) AS (SELECT 'LOCKED ENTITY: ACCOUNT
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:47
CUST NOTES: <Please enter explanation here>
Customer notes are entered here.

UNLOCKED ENTITY: ACCOUNT
UNLOCKED BY USER: ops
UNLOCKED AT: 31/05/2004 11:49
UNLOCK NOTES: <Please enter explanation here>
Test

LOCKED ENTITY: USER
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:50
LOCK NOTES: <Please enter explanation here>
Test

UNLOCKED ENTITY: USER
UNLOCKED BY USER: ops
UNLOCKED AT: 24/08/2009 16:47
UNLOCKED NOTES: <Please enter explanation here>' FROM dual)

Query:

select ltrim( regexp_substr(VALUE,
                            '^LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,LEVEL,'m'), 
                            'LOCKED AT: ') as "Dates Locked" 
  FROM tableA
 CONNECT BY regexp_substr(VALUE,
                            '^LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,LEVEL,'m') IS NOT null

Result:

Dates Locked
------------
31/05/2004
31/05/2004

The problem with this query is, that the connect by is performed for the complete table - to avoid this one possible solution is to seperate this part and perform it for any datarow indipendly:

SELECT ID, ltrim( regexp_substr(VALUE,
                            '^LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,lvl.column_value,'m'), 
                            'LOCKED AT: ') as "Dates Locked" 
  FROM tableA
  CROSS JOIN table(cast(multiset
    (select level from dual
    connect by regexp_substr(VALUE,
                            '^LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,LEVEL,'m') IS NOT null) 
    as sys.odcinumberlist)) lvl
Radagast81
  • 2,921
  • 1
  • 7
  • 21
  • Unfortunately this doesn't appear to work, I'm using 10g and when I attempt to run this it hangs indefinitely. I've given it a single ID in the where clause but I eventually killed the session after 30+ mins. – user7329642 Oct 29 '19 at 12:54
  • Problem was that i testet it only for a table with a single record, for multiple records the `connect by` has to be performed for any data row independly or you can give an universal upper bound `connect by level <= 5` will only give you the first 5 `UNLOCKED`-entries per id. – Radagast81 Oct 29 '19 at 13:23
2

You can use regexp_substr(), regexp_count(), regexp_like() and regexp_instr() regular expression functions together with substr() string operator function :

with tab as
(
select 
       regexp_substr(value,
              'LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,level)
              as Dates_Locked,
       regexp_instr(value,
              'LOCKED AT: ([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}\.?)',1,level)      
              as Pos_Unlocked,
       value 
  from tableA a 
  join tableB b 
    on a.id = b.id
connect by level <= regexp_count(value,'LOCKED AT: ') 
)
select ltrim( Dates_Locked, 'LOCKED AT: ' ) as Dates_Locked
  from tab     
 where not regexp_like(substr(value,Pos_Unlocked-2,2),'UN','i');

 Dates Locked
 ------------
 31/05/2004
 31/05/2004

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • Although originally accepted as the answer I understand now that this will not fit my requirement. When looking at the Demo the Single string is inserted into the table as 4 separate records. – user7329642 Oct 29 '19 at 12:53
0

You can do it all without regular expressions and just using simple string functions:

Oracle Setup:

CREATE TABLE test_data ( value ) AS
SELECT 'LOCKED ENTITY: ACCOUNT
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:47
CUST NOTES: <Please enter explanation here>
Customer notes are entered here.

UNLOCKED ENTITY: ACCOUNT
UNLOCKED BY USER: ops
UNLOCKED AT: 31/05/2004 11:49
UNLOCK NOTES: <Please enter explanation here>
Test

LOCKED ENTITY: USER
LOCKED BY USER: ops
LOCKED AT: 31/05/2004 11:50
LOCK NOTES: <Please enter explanation here>
Test

UNLOCKED ENTITY: USER
UNLOCKED BY USER: ops
UNLOCKED AT: 24/08/2009 16:47
UNLOCKED NOTES: <Please enter explanation here>' FROM DUAL;

Query:

Spilt the string first by double carriage returns (using a recursive sub-query) and then you can find the index of the : and carriage return separators and use those to extract the data.

WITH note_bounds ( value, start_pos, end_pos ) AS (
  SELECT REPLACE( value, CHR(13) ),
         1,
         INSTR( REPLACE( value, CHR(13) ), CHR(10) || CHR(10), 1 )
  FROM   test_data
UNION ALL
  SELECT value,
         end_pos + 2,
         INSTR( value, CHR(10) || CHR(10), end_pos + 2 )
  FROM   note_bounds
  WHERE  end_pos > 0
),
notes ( note ) AS (
  SELECT CASE end_pos
         WHEN 0
         THEN SUBSTR( value, start_pos )
         ELSE SUBSTR( value, start_pos, end_pos - start_pos )
         END
  FROM   note_bounds
),
note_data_bounds ( note, sep1, end1, sep2, end2, sep3, end3, sep4 ) AS (
  SELECT note,
         INSTR( note, ':', 1, 1 ),
         INSTR( note, CHR(10), 1, 1 ),
         INSTR( note, ':', 1, 2 ),
         INSTR( note, CHR(10), 1, 2 ),
         INSTR( note, ':', 1, 3 ),
         INSTR( note, CHR(10), 1, 3 ),
         INSTR( note, ':', 1, 5 )
  FROM   notes
)
SELECT SUBSTR( note, 1, sep1 - 8 ) AS action,
       SUBSTR( note, sep1 + 2, end1 - sep1 - 2 ) AS entity,
       SUBSTR( note, sep2 + 2, end2 - sep2 - 2 ) AS actor,
       TO_DATE( SUBSTR( note, sep3 + 2, end3 - sep3 - 2 ), 'DD/MM/YYYY HH24:MI' ) AS datetime,
       SUBSTR( note, sep4 + 2 ) AS notes
FROM   note_data_bounds;

Output:

ACTION   | ENTITY  | ACTOR | DATETIME            | NOTES                                                              
:------- | :------ | :---- | :------------------ | :------------------------------------------------------------------
LOCKED   | ACCOUNT | ops   | 2004-05-31 11:47:00 | <Please enter explanation here><br>Customer notes are entered here.
UNLOCKED | ACCOUNT | ops   | 2004-05-31 11:49:00 | <Please enter explanation here><br>Test                            
LOCKED   | USER    | ops   | 2004-05-31 11:50:00 | <Please enter explanation here><br>Test                            
UNLOCKED | USER    | ops   | 2009-08-24 16:47:00 | <Please enter explanation here>                                    

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • That's pretty impressive! Thanks. I'll give this a try for sure. – user7329642 Oct 25 '19 at 15:01
  • Although this solution looks great, I'm not sure it's suitable for me as it looks like there is a finite amount of values used in the query, unfortunately with this date I'm not sure how many note entries there will be so would need to iterate through all to determine that automatically. – user7329642 Oct 29 '19 at 13:31