3

I've that kind of string

Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z

My required result it's

 Column1  Column 2     Column3
 Test 1   new york        X
 Test 2   chicago         Y
 Test 3   harrisburg,pa   Z

But running this query

SELECT  
split_part(stat.st, '|', 1) Column1,
split_part(stat.st, '|', 2) Column2,    
split_part(stat.st, '|', 3) Column3
FROM
(
    SELECT
            UNNEST (
                string_to_array('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z',',')
            )
         AS st
) stat;

Result is

 Column1  Column 2   Column3
 Test 1   new york      X
 Test 2   chicago       Y
 Test 3   harrisburg    
 pa          Z  

Column3 could be everything (except | ). Possible pattern to match it's .This could be repeated N times. STRING could be everything except | char.

How could I use regexp_split_to_array() to have my desire result set?

Luigi Saggese
  • 5,299
  • 3
  • 43
  • 94

1 Answers1

4

There is barely enough information to make this work. But this does the job:

SELECT * FROM crosstab3(
   $$
   SELECT (rn/3)::text AS x, (rn%3)::text, item
   FROM  (
      SELECT row_number() OVER () - 1 AS rn, trim(item) AS item
      FROM (
         SELECT CASE WHEN rn%2 = 1 THEN regexp_split_to_table(item, ',') 
                     ELSE item END AS item
         FROM  (
            SELECT row_number() OVER () AS rn, *
            FROM regexp_split_to_table('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z', '\|') AS item
            ) x
         ) y
      ) z
   $$)

Returns:

 row_name | category_1 |   category_2   | category_3
----------+------------+----------------+------------
 0        | Test 1     | new york       | X
 1        | Test 2     | chicago        | Y
 2        | Test 3     | harrisburg, pa | Z

After splitting the string at |, I build on the criterion that only lines with uneven row number shall be split at ,.
I trim() the results and add derivatives of another row_number() to arrive at this intermediary state before doing the cross tabulation:

 x | text |      item
---+------+----------------
 0 | 0    | Test 1
 0 | 1    | new york
 0 | 2    | X
 1 | 0    | Test 2
 1 | 1    | chicago
 1 | 2    | Y
 2 | 0    | Test 3
 2 | 1    | harrisburg, pa
 2 | 2    | Z

Finally, I apply the crosstab3() function from the tablefunc module. To install it, if you haven't already:

CREATE EXTENSION tablefunc;

Pre-process with regexp_replace()

Here is an alternative that may be easier to comprehend. Not sure which is faster. Complex regular expressions tend to be expensive:

SELECT trim(split_part(a,'|', 1)) AS column1
      ,trim(split_part(a,'|', 2)) AS column2
      ,trim(split_part(a,'|', 3)) AS column3
FROM  (
   SELECT unnest(
             string_to_array(
                         regexp_replace('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z'
                        ,'([^|]*\|[^|]*\|[^,]*),', '\1~^~', 'g'), '~^~')) AS a
   ) sub

This one replaces commas (,) only after two pipes (|), before proceeding.
Now using * instead of + to allow for empty strings between the pipes.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    Please check your answer. I've tried that, but result it's different.
     0 T e s
    1 t  1
    2 | n e
    3 w  y
    4 o r k
    5 |  X
    6   
    7 T e s
    8 t  2
    9 |  c
    10 h i c
    11 a g o
    12 | Y ,
    13  T e
    14 s t 
    15 3 | 
    16 h a r
    17 r i s
    18 b u r
    19 g , 
    20 p a |
    21  Z
    
    – Luigi Saggese Mar 27 '13 at 17:44
  • @LuigiSaggese: I have tested with PostgreSQL 9.1 before I posted. I can reproduce my result. You may need to set [`standard_conforming_strings = on`](http://www.postgresql.org/docs/9.2/static/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS) - which is on by default in 9.1. – Erwin Brandstetter Mar 27 '13 at 17:56
  • @LuigiSaggese: It's a kind of ... ;) – Erwin Brandstetter Mar 27 '13 at 18:05
  • Last Solution it's amazing! Could you "repair" it for cases like this '||,' or 'Magic||' ? – Luigi Saggese Mar 27 '13 at 18:26
  • 2
    @LuigiSaggese: I believe I did now. – Erwin Brandstetter Mar 27 '13 at 18:36