0

I have something like below stored in a table column. I need only 133 extracted from this.

015.133.Governmental Affairs

When I do

select regexp_substr('015.133.Governmental Affairs', '\.*+[[:digit:]]+*',1,2) from dual;

The result is .133

If I do

regexp_substr('015.133.Governmental Affairs', '\*+[[:digit:]]+*',1,2)

it returns nothing. What's correct expression here?

Gary_W
  • 9,933
  • 1
  • 22
  • 40
Dito
  • 33
  • 3
  • 1
    The best way would be not to use regexp in the first place. Assuming the strings always have that format it's easy enough to find the first dot, then the second dot and return everything in between these points. – Alejandro Sep 09 '22 at 17:50
  • In your first example, you are specifying to match the literal period, followed by numbers. 133 happens to follow the first period (the asterisk is redundant). In the second, you are trying to match a literal asterisk, which is not in your string. – Gary_W Sep 09 '22 at 18:48

2 Answers2

1

The trick with coming up with a good regex is to be able to explain it in plain language first.

Editing to explain better hopefully.

Here I am matching zero or more digits where followed by a literal period. The 4th argument to REGEXP_SUBSTR (2) is which occurrence of this pattern to match on. Note the pattern consists of 2 groups as defined by being surrounded by parentheses. The 6th argument to REGEXP_SUBSTR says when a match is found to return the 1st subgroup (the numbers, not the period), if you put a 2 there you'd get the period that follows the number 133.

SELECT REGEXP_SUBSTR('015.133.Governmental Affairs', '([[:digit:]]*?)(\.)', 1, 2, NULL, 1) AS nbr
FROM dual;


NBR
---
133
1 row selected.
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • This is exactly what I wanted. Is there any article which explains how this works? – Dito Sep 09 '22 at 19:30
  • @Dito Since you are asking, I did not explain good enough. I will attempt to edit with a better explanation :-) – Gary_W Sep 09 '22 at 19:57
0

Here's something adapted from this question: How to extract group from regular expression in Oracle?

SELECT REGEXP_REPLACE(
    '015.133.Governmental Affairs',
    '^[[:digit:]]+\.([[:digit:]]+)\..*',
    '\1'
) FROM DUAL;

The regex looks for a string that starts with a series of digits, then ., then more digits, then another ., then the rest of the string. It then replaces the entire match (which is the entire string) with \1, which is whatever was in that second set of digits, inside the parentheses.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • The problem here is if the string is like this 'HPI Bioethics' , it's returning 'HPI Bi'. Rather it should return null. – Dito Sep 09 '22 at 19:25
  • @Dito Note if REGEXP_REPLACE cannot match the pattern, it returns the original string, where if REGEXP_SUBSTR cannot find a match, it returns NULL. I suspect your have a COLUMN statement or otherwise somehow restrict the length of the result to 6? – Gary_W Sep 09 '22 at 19:55
  • @Dito No, it's doesn't. Perhaps you didn't use the REGEXP_REPLACE in my example and instead kept REGEXP_SUBSTR. – Andy Lester Sep 09 '22 at 20:46