0

I have data as follows in a column

+----------------------+
|      my_column       |
+----------------------+
| test_PC_xyz_blah     |
| test_PC_pqrs_bloh    |
| test_Mobile_pqrs_bleh|
+----------------------+

How can I extract the following as columns?

+----------+-------+
| Platform | Value |
+----------+-------+
|   PC     |  xyz  |
|   PC     |  pqrs |
|  Mobile  |  pqrs |
+----------+-------+

I tried using REGEXP_SUBSTR

Default first pattern occurrence for platform:

select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)') as platform from table

Getting second pattern occurrence for value:

select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)', 1, 2) as value from table

This isn't working, however. Where am I going wrong?

Quik Tester
  • 347
  • 1
  • 5
  • 16

2 Answers2

2

For Non-empty tokens

select  regexp_substr(my_column,'[^_]+',1,2)    as platform
       ,regexp_substr(my_column,'[^_]+',1,3)    as value

from    my_table
;

For possibly empty tokens

select  regexp_substr(my_column,'^.*?_(.*)?_.*?_.*$',1,1,'',1)  as platform
       ,regexp_substr(my_column,'^.*?_.*?_(.*)?_.*$',1,1,'',1)  as value

from    my_table
;

+----------+-------+
| PLATFORM | VALUE |
+----------+-------+
| PC       | xyz   |
+----------+-------+
| PC       | pqrs  |
+----------+-------+
| Mobile   | pqrs  |
+----------+-------+
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
0

(.*) is greedy by nature, it will match all character including _ character as well, so test_(.*) will match whole of your string. Hence further groups in pattern _(.*)_(.*) have nothing to match, whole regex fails. The trick is to match all characters excluding _. This can be done by defining a group ([^_]+). This group defines a negative character set and it will match to any character except for _ . If you have better pattern, you can use them like [A-Za-z] or [:alphanum]. Once you slice your string to multiple sub strings separated by _, then just select 2nd and 3rd group.

ex:

SELECT REGEXP_SUBSTR( my_column,'(([^_]+))',1,2) as platform, REGEXP_SUBSTR( my_column,'(([^_]+))',1,3) as value from table;

Note: AFAIK there is no straight forward method to Oracle to exact matching groups. You can use regexp_replace for this purpose, but it unlike capabilities of other programming language where you can exact just group 2 and group 3. See this link for example.

Community
  • 1
  • 1
Vijayakumar Udupa
  • 1,115
  • 1
  • 6
  • 15