0

What is the regular expression query to get character or string after nth occurrence of pipeline | symbol in ORACLE? For example I have two strings as follows,

 Jack|Sparrow|17-09-16|DY7009|Address at some where|details       
 |Jack|Sparrow|17-09-16||Address at some where|details

I want 'DY7009' which is after 3rd pipeline symbol starting from 1st position, So what will be regular expression query for this? And in second string suppose that 1st position having | symbol, then I want 4th string if there is no value then it should give NULL or BLANK value.

select regexp_substr('Jack|Sparrow|17-09-16|DY7009|Address at some where|details'
        ,' ?? --REX Exp-- ?? ') as col 
from dual;

Result - DY7009

select regexp_substr('Jack|Sparrow|17-09-16|DY7009|Address at some where|details'
         ,' ?? --REX Exp-- ?? ') as col 
 from dual;

Result - '' or (i.e. NULL)

So what should be the regexp? Please help. Thank you in Advance

NEW UPDATE Edit ---

Thank you all guys!!, I appreciate your answer!!. I think, I didn't ask question right. I just want a regular expression to get 'string/character string' after nth occurrence of pipeline symbol. I don't want to replace any string so only regexp_substr will do the job.

----> If 'Jack|Sparrow|SQY778|17JULY17||00J1' is a string

I want to find string value after 2nd pipe line symbol here the answer will be SQY778. If i want to find string after 3rd pipeline symbol then answer will be 17JULY17. And if I want to find value after 4th pipeline symbol then it should give BLANK or NULL value because there is nothing after 4th pipeline symbol. If I want to find string 5th symbol then I will only replace one digit in Regular expression i.e. 5 and I will get 00J1 as a result.

  • Check my answer below.Note that In the second query that u have provided, the string used is not same as mentioned at the beginning. I am assuming you need || to be treated as blank(NULL) – Kaushik Nayak Jul 24 '17 at 06:38
  • Check the last edit to my answer. It should work.Let me know if u have any scenarios where its not working. – Kaushik Nayak Jul 25 '17 at 05:09

4 Answers4

3

Here ya go. Replace the 4th argument to regexp_substr() with the number of the field you want.

with tbl(str) as (
  select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) field_4
from tbl;

FIELD_4
--------

DY7009

SQL>

To list all the fields:

with tbl(str) as (
  select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, level, NULL, 1) split
from tbl
connect by level <= regexp_count(str, '\|')+1;

SPLIT
-------------------------

Jack
Sparrow
17-09-16
DY7009
Address at some where
details

6 rows selected.

SQL>

So if you want select fields you could use:

with tbl(str) as (
      select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
    )
    select 
      regexp_substr(str, '(.*?)(\||$)', 1, 1, NULL, 1) first,
      regexp_substr(str, '(.*?)(\||$)', 1, 2, NULL, 1) second,
      regexp_substr(str, '(.*?)(\||$)', 1, 3, NULL, 1) third,
      regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) fourth
    from tbl;

Note this regex handles NULL elements and will still return the correct value. Some of the other answers use the form '[^|]+' for parsing the string but this fails when there is a NULL element and should be avoided. See here for proof: https://stackoverflow.com/a/31464699/2543416

Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • Very Close! but output changes enter image description here For 1st query I should get NULL. for 2nd query I should get DY&009 and For 3rd query I should get Address at some where. –  Jul 24 '17 at 15:52
  • Your output changes because the pipes are in different places, hence the fields are in a different order. First you need to get your data in a consistent layout. Every one of your examples is different. Please run the last example above, pasting in the different examples and you will see. Solve your data consistency issue before tackling the parsing issue. – Gary_W Jul 24 '17 at 17:31
  • @Monali. Check the last edit to my answer. It should work.Let me know if u have any scenarios where its not working. – Kaushik Nayak Jul 25 '17 at 05:09
0

You can use regex_replace to get the nth matching group. In your example, the fourth match could be retrieved like this:

select regexp_replace(
    'Jack|Sparrow|17-09-16|DY7009|Address at some where|details',
    '^([^\|]*\|){3}([^\|]*)\|.*$',
    '\4'
) as col
from dual;

Edit: Thanks Arijit Kanrar for pointing out the missing escape characters.

To OP: regex_replace doesn't replace anything in the database, only in the returned string.

0

You can use this query to get the value at the specific column ( nth occurrence ) as follows

SELECT nth_string
FROM
  (SELECT TRIM (REGEXP_SUBSTR (long_string, '[^|]+', 1, ROWNUM) ) nth_string ,
    level AS lvl
  FROM
    (SELECT REPLACE('Jack|Sparrow|17-09-16|DY7009|Address at some where|details','||','| |') long_string
    FROM DUAL
    )
    CONNECT BY LEVEL <= REGEXP_COUNT ( long_string, '[^|]+')
  )
WHERE lvl = 4;

Note that i am using the standard query in oracle to split a delimited string into records. To handle blank between delimiters as in your second case, i am replacing it with a space ' ' . The space gets converted to NULL after applying TRIM() function.

You can get any nth record by replacing the number in lvl = at the end of the query.

Let me know your feedback. Thanks.

EDIT:

It seems to not work with purely regexp_substr() as there is no way to convert blank between '||' to Oracle NULL .So intermediate TRIM() was required and i am adding a replace to make it easier. There will be patterns to directly match this scenario, but could not find them.

Here are all scenarios for 4th occurence .

WITH t
     AS (SELECT '|Jack|Sparrow|SQY778|17JULY17||00J1' long_string
         FROM   dual
         UNION ALL
         SELECT 'Jack|Sparrow|SQY778|17JULY17||00J1' long_string
         FROM   dual
         UNION ALL
         SELECT '||Jack|Sparrow|SQY778|17JULY17|00J1' long_string
         FROM   dual)
SELECT long_string,
       Trim (Regexp_substr (mod_string, '\|([^|]+)', 1, 4, NULL, 1)) nth_string
FROM   (SELECT long_string,
               Replace(long_string, '||', '| |') mod_string
        FROM   t)  ;

    LONG_STRING                             NTH_STRING
   ------------------------                -----------
    |Jack|Sparrow|SQY778|17JULY17||00J1     17JULY17
    Jack|Sparrow|SQY778|17JULY17||00J1      NULL 
    ||Jack|Sparrow|SQY778|17JULY17|00J1     SQY778

EDIT2: Finally a pattern that gives the solution.Thanks to Gary_W

To get the nth occurence from the string , use:

WITH t
     AS (SELECT '|Jack|Sparrow|SQY778|17JULY17||00J1' long_string
         FROM   dual
         UNION ALL
         SELECT 'Jack|Sparrow|SQY778|17JULY17||00J1' long_string
         FROM   dual
         UNION ALL
         SELECT '||Jack|Sparrow|SQY778|17JULY17|00J1' long_string
         FROM   dual)
SELECT long_string,
       Trim (regexp_substr (long_string, '(.*?)(\||$)', 1, :n + 1, NULL, 1)) nth_string
       FROM t;
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
  • Thanks! please look after NEW UPDATE Edit --- in question I have updated it. –  Jul 24 '17 at 13:53
  • `'[^|]+'` does not work with NULL elements as you found out. Use: `select regexp_substr(long_string, '(.*?)(\||$)', 1, 4, NULL, 1)` . Note the strings starting with a pipe really have a NULL element in front of the pipe. Bad data is really the problem here. – Gary_W Jul 24 '17 at 17:57
  • It works if we add :n + 1 for nth occurrence. .i.e. for 4th occurrence , it should be 5. Thanks anyway! – Kaushik Nayak Jul 25 '17 at 05:12
0

Don't have enough reputation to comment on Chris Johnson's answer so adding my own. Chris has the correct approach of using back-references but forgot to escape the Pipe character. The regex will look like this.

    WITH dat
     AS (SELECT 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details' AS str,
                3 AS pos
         FROM   DUAL
         UNION
         SELECT ' |Jack|Sparrow|17-09-16||Address at some where|details' AS str,
                4 AS pos
         FROM   DUAL)
SELECT str,
       pos,
       REGEXP_REPLACE (str, '^([^\|]*\|){' || pos || '}([^\|]*)\|.*$', '\2')
          AS regex_result
FROM   dat;

I'm creating the regex dynamically by adding the position of the Pipe character dynamically.

The result looks like this.

|Jack|Sparrow|17-09-16||Address at some where|details (4):

Jack|Sparrow|17-09-16|DY7009|Address at some where|details (3): DY7009

Arijit Kanrar
  • 451
  • 3
  • 15
  • Thank you, I appreciate your answer. I think, I didn't ask question right. I just want a regular expression to get 'string/character string' after nth occurrence of pipeline symbol. I don't want to replace any string so only regexp_substr will do the job. –  Jul 24 '17 at 13:44