6

I have the query In Oracle SQL:

       select town_name, 
              regexp_substr(town_name, '[^A,]+', 1, 1) as c1,
              regexp_substr(town_name, '[^A,]+', 1, 2) as c2, 
              regexp_substr(town_name, '[^A,]+', 1, rownum) as c_rownum,
              rownum
          from epr_towns

The first 2 rows from the result are:

VALGANNA        V   LG  V   1
VARANO BORGHI   V   R   R   2

I need to obtain the same result on PostgreSQL (for the row with regexp_substr(town_name, '[^A,]+', 1, rownum) as c_rownum), and I don't know how. Could you help me? Thanks.

Catalin Vladu
  • 389
  • 1
  • 6
  • 17
  • The input is "town_name" ("VALGANNA" for the first row). – Catalin Vladu Aug 18 '15 at 09:28
  • The actual values are VALGANNA and VARANO BORGHI (column 1). The columns 2..4 are the results obtained with "regexp_substr" in Oracle. – Catalin Vladu Aug 18 '15 at 09:43
  • Warning! The regex of the format '[^A,]+' commonly used for parsing delimited strings fails when there are NULL elements in the list. Make sure you test thoroughly. Always expect the unexpected! See here for more details and an improved regex. I don't know how it would translate to Postgres though: https://stackoverflow.com/questions/31464275/split-comma-separated-values-to-columns-in-oracle/31464699#31464699 – Gary_W Aug 18 '15 at 13:14

3 Answers3

7

There's really two separate problems here

  • replacing rownum
  • replacing regexp_substr with regexp_matches

To solve for rownum, use a CTE (WITH clause) to add a rownum-like column to your underlying table.

regexp_matches works a little differently than Oracle regexp_substr. While Oracle regexp_substr takes the nth match as an argument, PostgreSQL regexp_matches will return all matches as a table-valued function. So you have to wrap the call in a subquery with limit/offset to pluck out the nth match. Also, the rows returned by regexp_substr are arrays, so assuming you have no parenthesized expressions in your regexp, you need to index/dereference the first item in the array.

End result looks like this:

http://sqlfiddle.com/#!17/865ee/7

 with epr_towns_rn as (
    select town_name,
      row_number() over(order by town_name) as rn
  from epr_towns
)
select town_name,
   (select (regexp_matches(town_name, '[^A,]+', 'g'))[1] offset 0 limit 1) as c1,
   (select (regexp_matches(town_name, '[^A,]+', 'g'))[1] offset 1 limit 1) as c2,
   (select (regexp_matches(town_name, '[^A,]+', 'g'))[1] offset rn-1 limit 1)
     as c_rownum,
   rn
   from epr_towns_rn;

If you only wanted the first match, you could leave out the 'g' argument and leave out the limit/offset from the subquery but you still need the subquery wrapper in case there's no match, to mimic regexp_substr returning null when no match.

wrschneider
  • 17,913
  • 16
  • 96
  • 176
3

You can find your case and other scenarios on this page: https://pgxn.org/dist/orafce/doc/sql_migration/sql_migration03.html

--

In short you can do something like that: from OracleDB:

SELECT
    REGEXP_SUBSTR('one two three four five ',
    '(\S*)\s') AS "REGEXP"
FROM DUAL;

Result: one

to PostgreSQL:

SELECT (
    SELECT array_to_string(a, '') AS "REGEXP"
    FROM regexp_matches('one two three four five ',
    '(\S*)\s',
    'g') AS f(a)
    LIMIT 1
);

Result: one

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Stas
  • 43
  • 4
1

I don't have a table, so I use generate for example:

   select town_name, 
          regexp_substr(town_name, '[^A,]+', 1, 1) as c1,
          regexp_substr(town_name, '[^A,]+', 1, 2) as c2, 
          regexp_substr(town_name, '[^A,]+', 1, dense_rank() over (order by town_name)) as c_rownum,
          dense_rank() over (order by c)
      from epr_towns

I believe you look for dense_rank window function?..

NB. it is always easier to have SQL fiddle or initial code

Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
  • I am have been looking for dense_rank() and conversion of regexp_substr() with 4 parameters (for 2 parameters, it works "substring(town_name, '[^A,]+')" ) – Catalin Vladu Aug 18 '15 at 12:27