Regexp to find only 6 digit number in text PostgreSQL

Question

I need to build a query in PostgreSQL and am required to find all text entries that contain a 6 digit number (e.g. 000999, 019290, 998981, 234567, etc). The problem is that the number is not necessary at the begining of the string or at its end.

I tried and didn't work:

[0-9]{6} - returns part of a number with more than 6 digits
(?:(?<!\d)\d{6}(?!\d)) - postgresql does not know about lookbehind
[^0-9][0-9]{6}[^0-9] and variations on it, but to no avail.

Building my own Perl/C function is not really an option as I do not have the skills required. Any idea what regexp could be used or other tricks that elude me at the moment?

EDIT

Input samples:

aa 0011527 /CASA -> should return NOTHING
aa 001152/CASA -> should return 001152
aa001152/CASA -> should return 001152
aa0011527/CASA -> should return NOTHING
aa001152 /CASA -> should return 001152

score 6 · Accepted Answer · edited Jan 24 '13 at 15:10

6

If PostgreSQL supports word boundaries, use \b:

\b(\d{6})\b

Edit:

\b in PostgreSQL means backspace, so it's not a word boundary.

http://www.postgresql.org/docs/8.3/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP however, will explain you that you can use \y as a word boundary, as it means matches only at the beginning or end of a word, so

\y(\d{6})\y

should work.

\m(\d{6})\M

should also work.

Full list of word matches in PostgreSQL regex:

Escape  Description
\A      matches only at the beginning of the string (see Section 9.7.3.5 for how this differs from ^)
\m      matches only at the beginning of a word
\M      matches only at the end of a word
\y      matches only at the beginning or end of a word
\Y      matches only at a point that is not the beginning or end of a word
\Z      matches only at the end of the string (see Section 9.7.3.5 for how this differs from $)

New edit:

Based on your edit, you should be able to do this:

(^|[^\d])(\d+)([^\d]|$)

edited Jan 24 '13 at 15:10

Alan Moore

73,866
12
100
156

answered Jan 24 '13 at 14:11

h2ooooooo

39,111
8
68
102

I think the requirement is that string may contain other characters also. – Naveed S Jan 24 '13 at 14:15
@NaveedS Since we didn't use `^` or `$`, it will match at any place in a string. – h2ooooooo Jan 24 '13 at 14:18
@NaveedS Yes, the string may contain other strings. Also, PostgreSQL has following boundaries: http://stackoverflow.com/questions/3825676/postgresql-regex-word-boundaries.
I tried your solution, but didn't work. I added some samples in my question. – CristisS Jan 24 '13 at 14:20
1

@h2ooooooo My final query is in my answer, but it would've not been possible without your suggestions. Marked as answer and thanks! :) – CristisS Jan 24 '13 at 15:11

score 0 · Answer 2 · answered Jan 24 '13 at 15:10

Using what @h2ooooooo proposed I managed to create the following query:

SELECT cleantwo."ID",cleantwo."Name",cleantwo."Code"
FROM
(
SELECT cleanone."ID",cleanone."Name",unnest(cleanone."Code") as "Code" -- 3. unnest all the entries received using regexp_matches (get all combinations)
FROM 
(
SELECT sd."ID", sd."Name", regexp_matches(sd."Name", '(^|[^\d])(\d+)([^\d]|$)')
    as "Code"
FROM "T_SOME_DATA" sd
WHERE substring(sd."Name" from 1 for 15) ~('(^|[^\d])(\d+)([^\d]|$)') -- 1. get all data possible
) as cleanone
WHERE cleanone."Code" IS NOT NULL -- 2. get data where code IS NOT NULL (remove useless entries)
) as cleantwo
WHERE length(cleantwo."Code")=6 -- 4. get only the combinations relevant to my initial requirement (codes with length 6)<br/>

It took me a lot of time to find this so I hope it helps someone else in the same situation. Good luck!

Regexp to find only 6 digit number in text PostgreSQL

2 Answers2