I am somewhat hesitant to even post my answer. You really must upgrade. Version 8.2 reaches end of live right now. Follow the link @a_horse_with_no_name posted.
However, the problem got my attention. The following demo should work with PostgreSQL 8.2:
SELECT -- content,
rtrim(
regexp_replace(
regexp_replace(
replace(
substring(content
,E'(XXX\\d+|XXX\\d+.*XXX\\d+)') -- kill leading / trailing noise
,',',' ') -- remove all ","
,E'(XXX\\d+)', E'\\1,', 'g') -- terminate X-strings with ","
-- now we can use non-greedy terminated with ","
,E'(XXX\\d+?,)*?.*?(XXX\\d+?,)', E'\\1\\2', 'g')
,',') AS result
FROM (VALUES
('no match')
,('XXX010101')
,('XXX010102 beginn')
,('end XXX010103')
,('middle XXX010104 match')
,('match middle XXX010105 and end XXX010106')
,('XXX010107, match beginn XXX010108 & middle')
,('XXX010109 begin and end XXX010110')
,('XXX01011 begin XXX010112 middle and end XXX010113')
,('XXX01014 begin XXX010115 close matches XX010113 XXXy010113 23624 ,XXX010116')
,('f XXX01017 B XXX010118 457XXX010119 XXXy XXX010120 overkill XXX010121end')
) data(content)
Result:
result
--------------------------------------------------
-- first line is NULL
XXX010101
XXX010102
XXX010103
XXX010104
XXX010105,XXX010106
XXX010107,XXX010108
XXX010109,XXX010110
XXX01011,XXX010112,XXX010113
XXX01014,XXX010115,XXX010116
XXX01017,XXX010118,XXX010119,XXX010120,XXX010121
Some explanation:
- no
regex_matches()
in version 8.2 as OP already stated
- but there is
regexp_replace()
which can use a g
flag (replace g
lobally)
- we cannot mix greedy and non-greedy quantifiers in the same regular expression
- so I terminate the wanted strings with a
,
, after removing all other occurances of ,
Could be any character not part of the wanted strings, but ,
can serve as separator in the result.
- first cut leading and trailing noise
- then globally replace everything in between wanted strings.
- to make that work use
(XXX\\d+?,)*?
to capture any number of leading occurances of wanted string.
final rtrim()
removes trailing ,
In PsotgreSQL 8.3+ you could use regexp_split_to_table()
to split the wanted strings into single rows. In 8.2 you'll have to come up with something of your own. I would write a plgpsql function ...
This makes heavy use of a features of PostgreSQL's POSIX Regular Expressions (link to version 8.2!)