1

I need to keep on the line those words that includes part of a sub-string. Ex:

select regexp_substr('libstdc++-devel-4.8.5-39.0.3.el7.x86_64, python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, git-1.8.3.1-20.el7.x86_64', 
       '(.+?)(devel|git|nfs).+?\,') st from dual;

This query returns libstdc++-devel-4.8.5-39.0.3.el7.x86_64, but I also need git-1.8.3.1-20.el7.x86_64 and nfs-utils-1.3.0-0.65.0.1.el7.x86_64 based on the pattern. Note that I have dozens of words on each row (not only those 4) so I need to adjust the pattern, to expand the result (like adding grub, perl, etc). Thank you.

misurex
  • 35
  • 3
  • 1
    Looks like your solution is [here](https://stackoverflow.com/questions/17596363/oracle-11g-get-all-matched-occurrences-by-a-regular-expression) – thang Mar 29 '20 at 04:23

1 Answers1

2

You have a couple of issues with your regex; it will match (for example) python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, because it allows space and commas before the match to one of devel,git or nfs (in (.+?)). Also, it won't match git-1.8.3.1-20.el7.x86_64 because it requires a comma at the end of the match. You can fix those issues by replacing (.+?) with [^ ]* and \, with (,|$). Then you can use a solution derived from this question (although in my preferred form, checking for NOT NULL in the CONNECT BY) to get all your results:

SELECT REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level) AS lib
FROM (
  SELECT 'libstdc++-devel-4.8.5-39.0.3.el7.x86_64, python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, git-1.8.3.1-20.el7.x86_64' as str
  FROM dual
)
CONNECT BY REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level) IS NOT NULL

Output:

LIB
libstdc++-devel-4.8.5-39.0.3.el7.x86_64,
nfs-utils-1.3.0-0.65.0.1.el7.x86_64,
git-1.8.3.1-20.el7.x86_64

Note this leaves commas on the end of some of the names, if you want to get rid of them, just use RTRIM:

SELECT RTRIM(REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level), ',') AS lib
FROM (
  SELECT 'libstdc++-devel-4.8.5-39.0.3.el7.x86_64, python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, git-1.8.3.1-20.el7.x86_64' as str
  FROM dual
)
CONNECT BY REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level) IS NOT NULL

Output:

LIB
libstdc++-devel-4.8.5-39.0.3.el7.x86_64
nfs-utils-1.3.0-0.65.0.1.el7.x86_64
git-1.8.3.1-20.el7.x86_64

If you want to get all the values on one line, you can use the first query as a CTE and then use LISTAGG on the result:

WITH CTE AS (
  SELECT level AS l, REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level) AS lib
  FROM (
    SELECT 'libstdc++-devel-4.8.5-39.0.3.el7.x86_64, python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, git-1.8.3.1-20.el7.x86_64' as str
    FROM dual
  )
  CONNECT BY REGEXP_SUBSTR(str, '[^ ]*(devel|git|nfs).+?(,|$)', 1, level) IS NOT NULL
)
SELECT LISTAGG(lib) WITHIN GROUP (ORDER BY l) AS lib
FROM CTE

Output:

LIB
libstdc++-devel-4.8.5-39.0.3.el7.x86_64,nfs-utils-1.3.0-0.65.0.1.el7.x86_64,git-1.8.3.1-20.el7.x86_64

Demo on dbfiddle

Nick
  • 138,499
  • 22
  • 57
  • 95
  • I clicked your answer as useful, but as I mentioned, I need to keep the result on the same row, not separated. I actually tried to use (without success): select regexp_replace('libstdc++-devel-4.8.5-39.0.3.el7.x86_64, python-kitchen-1.1.1-5.el7.noarch, nfs-utils-1.3.0-0.65.0.1.el7.x86_64, git-1.8.3.1-20.el7.x86_64', '[^ ]*(devel|git|nfs).+?(,|$)', '\1, ') st from dual; Using TRIM is the easy part, I need this result: libstdc++-devel-4.8.5-39.0.3.el7.x86_64, git-1.8.3.1-20.el7.x86_64, nfs-utils-1.3.0-0.65.0.1.el7.x86_64.Thanks – misurex Mar 29 '20 at 18:18
  • @misurex Apologies - I missed that part of the question. The easiest way to get the result on one line is to `LISTAGG` the result of the first query as trying to replace things which do not match is quite tricky, especially without lookaheads. See my edit. – Nick Mar 30 '20 at 00:21
  • Thank you Nick, I can use this workaround, thank you. – misurex Mar 30 '20 at 01:11