Specific patterns in Postgresql

Question

I'm getting familiar with postgres sql, but having some trouble with pattern matching. I read the documentation and looked through other questions, but couldn't solve this on my own.

I have a field with lots of text data, in the middle of it, numbers with this pattern:

"2021-1234567" (four digits + - + seven digits)

Problem is, it can have other number sequences. Like this:

"Project number 12345678912345 with id 2020-2583697 1456" (in this case, i need to extract 2020-2583697)

In some cases it may be just eleven digits, like this:

"Project 12345678912345 sequence 20202583697 1456" (in this case i need to extract 20202583697)

At first i tried to extract only the numbers (the text is mostly user input) with:

SELECT
   SUBSTRING("my_field", '^[0-9]+$' )
FROM 
    my_table

That didn't help at all... Can anyone help me?

Does the piece of text your are looking for always start with `2020` ? — Luuk, Aug 16 '21 at 16:45
It usually stars with some year, but not only 2020 (i have 2016, 2018, 2020, 2021 and many others) — Luan, Aug 16 '21 at 16:48
Or, `select substring('your field','20[12][0-9]-[0-9]+$') from mytable ` will match all values starting with a values between `2010` and `2029`. — Luuk, Aug 16 '21 at 16:49
Have a look at [regex101.com](https://regex101.com). There you can play with regular expressions and get also an explanation. — Steeeve, Aug 16 '21 at 16:58
@Luuk I've noticed when the numbers are atached to text this doens't work (like in project2020-1234567abc) Is this expected? — Luan, Aug 16 '21 at 17:07
@Steeeve i am looking to learn more about pattern matching in sql, but i apreciate the help, will look into it! — Luan, Aug 16 '21 at 17:09
@Luan the patterns here, also yours `^[0-9]+$` are regular expressions ;) Just paste it into the regex field in regex101 and you get the explanation on the right side what it searches. — Steeeve, Aug 16 '21 at 17:19
@Luan: The `$` at the end means that you will look for the end of the string. see (for an overview of those special things): [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) — Luuk, Aug 16 '21 at 17:19

score 1 · Accepted Answer · answered Aug 16 '21 at 17:55

This appears to do what you want:

select substring(str, '[0-9]{4}-?[0-9]{7}')
from (values ('asfasdf 2020-2583697 qererf i0iu0 1234234'),
             ('asfasdf 20202583697 qererf i0iu0 1234234')
             ) v(str)

It searches for 4 digits followed by an optional hyphen followed by 7 digits.

marcothesane · Answer 2 · 2021-08-17T09:38:28.733

Or this, as I could not manage to force checking for blanks around the pattern without returning those blanks otherwise:

WITH 
indata(s) AS (
            SELECT 'Project number 12345678912345 with id 2020-2583697 1456'
  UNION ALL SELECT 'Project 12345678912345 sequence 20202583697 1456'
)
SELECT
  REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') AS found_token
, s
FROM indata;
 found_token  |                            s                            
--------------+---------------------------------------------------------
 2020-2583697 | Project number 12345678912345 with id 2020-2583697 1456
 20202583697  | Project 12345678912345 sequence 20202583697 1456
(2 rows)

The pattern used - REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') - means: replace ^.* the beginning of the string and any number of any characters, followed by a blank; then (\d{4}-?\d{7}) four digits, zero or one dash - -?, and seven digits - and the parentheses around it mean: remember this as the first group; finally: .*$ a blank, then any number of any characters till the end of the string - with group 1: \1 .

Specific patterns in Postgresql

2 Answers2