1

I'm working with this query:

SELECT
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 1) field1,
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 2) field2,
    REGEXP_SUBSTR ('test1   test2   test3', '[^\t]+', 1, 3) field3
FROM DUAL

I'm looking for each field to have each value. this is a tab-delimited string. But I'm getting random outputs to it.

Looking for:

field1    field2    field3
test1     test2     test3

Getting:

field1    field2    field3
es        es        es

what am I missing?

thanks

arcee123
  • 101
  • 9
  • 41
  • 118
  • 2
    By the way and just FYI, the Posix pattern `'[^[:space:]]+'` works for both tabs and spaces(even carriage return, newline etc). – Kaushik Nayak Mar 22 '19 at 17:26

2 Answers2

4

[^\t]+ will match characters that are not a backslash and not a t character.

So

SELECT
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 1) field1,
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 2) field2,
    REGEXP_SUBSTR ('aaatbbb\ccc', '[^\t]+', 1, 3) field3
FROM DUAL

Will output:

FIELD1 | FIELD2 | FIELD3
:----- | :----- | :-----
aaa    | bbb    | ccc   

Having split the string on \ and t

If you want to split it on a tab character then:

SELECT
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1' || CHR(9) || 'test2' || CHR(9) || 'test3', '[^' || CHR(9) || ']+', 1, 3) field3
FROM DUAL

Which outputs:

FIELD1 | FIELD2 | FIELD3
:----- | :----- | :-----
test1  | test2  | test3 

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
2

Characters inside the square brackets are interpreted as literal characters, not escape characters - so you're matching anything except 't' or '\' rather than tabs.

You could embed an actual tab character using concatenation:

SELECT
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1   test2   test3', '[^'||chr(9)||']+', 1, 3) field3
FROM DUAL;

FIELD FIELD FIELD
----- ----- -----
test1 test2 test3

You may have problems with adjacent tabs though:

SELECT
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 1) field1,
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 2) field2,
    REGEXP_SUBSTR ('test1       test3', '[^'||chr(9)||']+', 1, 3) field3
FROM DUAL;

FIELD FIELD FIELD3
----- ----- ------
test1 test3       

A safer pattern is:

SELECT
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 1, null, 1) field1,
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 2, null, 1) field2,
    REGEXP_SUBSTR ('test1       test3', '(.*?)('||chr(9)||'|$)', 1, 3, null, 1) field3
FROM DUAL;

FIELD FIELD2 FIELD
----- ------ -----
test1        test3

db<>fiddle

Alex Poole
  • 183,384
  • 11
  • 179
  • 318