2

I have a database structure in PostgreSQL that looks something like this:

DROP TABLE IF EXISTS  medium  CASCADE;
DROP TABLE IF EXISTS  works   CASCADE;
DROP DOMAIN IF EXISTS nameVal CASCADE;
DROP DOMAIN IF EXISTS numID   CASCADE;
DROP DOMAIN IF EXISTS alphaID CASCADE;

CREATE DOMAIN alphaID   AS VARCHAR(10);
CREATE DOMAIN numID     AS INT;
CREATE DOMAIN nameVal   AS VARCHAR(40);

CREATE TABLE works (
   w_alphaID    alphaID     NOT NULL,
   w_numID      numID       NOT NULL,
   w_title      nameVal     NOT NULL,
   PRIMARY KEY(w_alphaID,w_numID));


CREATE TABLE medium (
   m_alphaID    alphaID     NOT NULL,
   m_numID      numID       NOT NULL,
   m_title      nameVal     NOT NULL,
   FOREIGN KEY(m_alphaID,m_numID) REFERENCES 
      works ON UPDATE CASCADE ON DELETE CASCADE);

INSERT INTO works VALUES('AB',1,'Sunset'),
                        ('CD',2,'Beach'),
                        ('EF',3,'Flower');

INSERT INTO medium VALUES('AB',1,'Wood'),
                         ('AB',1,'Oil'),
                         ('CD',2,'Canvas'),
                         ('CD',2,'Oil'),
                         ('CD',2,'Bronze'),
                         ('EF',3,'Paper'),
                         ('EF',3,'Pencil');
SELECT * FROM works;
SELECT * FROM medium;

SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS
       Name_of_work, m_title AS Material_used 
     FROM works, medium WHERE 
       works.w_alphaID = medium.m_alphaID 
       AND works.w_numID = medium.m_numID;

The output looks something like this:

 w_alphaid | w_numid | w_title 
-----------+---------+---------
 AB        |       1 | Sunset
 CD        |       2 | Beach
 EF        |       3 | Flower
(3 rows)

 m_alphaid | m_numid | m_title 
-----------+---------+---------
 AB        |       1 | Wood
 AB        |       1 | Oil
 CD        |       2 | Canvas
 CD        |       2 | Oil
 CD        |       2 | Bronze
 EF        |       3 | Paper
 EF        |       3 | Pencil
(7 rows)

 alphaid | numid | name_of_work | material_used 
---------+-------+--------------+---------------
 AB      |     1 | Sunset       | Wood
 AB      |     1 | Sunset       | Oil
 CD      |     2 | Beach        | Canvas
 CD      |     2 | Beach        | Oil
 CD      |     2 | Beach        | Bronze
 EF      |     3 | Flower       | Paper
 EF      |     3 | Flower       | Pencil
(7 rows)

Now my question is what query should I use to have the format of the last SELECT statement to look something like this:

 alphaid | numid | name_of_work | material_used_1 | material_used_2 | material_used_3 
---------+-------+--------------+-----------------+-----------------+---------------
 AB      |     1 | Sunset       | Wood            | Oil             |
 CD      |     2 | Beach        | Canvas          | Oil             | Bronze
 EF      |     3 | Flower       | Paper           | Pencil          |
(3 rows)

I looked into using string_agg() but that puts the values into one cell but I am looking to have a separate cell for each value. I tried using join to see if I can achieve such output but with no success so far. I appreciate you taking the time to look at this question.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
KingMackle
  • 53
  • 1
  • 4

2 Answers2

1

You can use string_agg() in a subquery and then break the string into separate columns. See also this question on how to split string into columns

SELECT alphaID, numID, Name_of_Work
      ,split_part(Material_used, ',', 1) AS Material_used_1
      ,split_part(Material_used, ',', 2) AS Material_used_2
      ,split_part(Material_used, ',', 3) AS Material_used_3
      ,split_part(Material_used, ',', 4) AS Material_used_4
FROM (
    SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS Name_of_work,
           String_Agg( m_title, ',' ) AS Material_used 
    FROM works, medium 
    WHERE works.w_alphaID = medium.m_alphaID 
       AND works.w_numID = medium.m_numID 
    GROUP BY w_alphaID, w_numID, w_title ) t
Community
  • 1
  • 1
Alex W
  • 3,283
  • 1
  • 19
  • 25
  • The idea is solid. The required `GROUP BY` clause is missing though. And it's safer to work with an array instead of string concatenation. `m_title` might contain the separator used in `string_agg()`, which would lead to errors. I added a solution with `array_agg()` to my answer. – Erwin Brandstetter Oct 20 '15 at 01:00
1

This would be simpler with a simpler schema:

  • No domain types (what's the purpose?)
  • Add an actual PK to table medium
  • Rather use a surrogate PKs (a serial column) instead of the multicolumn PK and FK over two domain types.
    Or at least use the same (simpler) column name for columns with the same content: just alpha_id instead of m_alphaID and w_alphaID etc.

That aside, here are solutions for your setup as is:

True crosstab() solution

There are several specific difficulties for your crosstab() query:

  • No single column that can serve as row_name.
  • Multiple extra columns.
  • No category column.
  • No defined order for values (so I use arbitrary order instead).

Basics (read this first!):

For your special case:

Solution:

SELECT alphaid, numid, name_of_work, material_1, material_2, material_3
FROM   crosstab(
  'SELECT rn, w.alphaid, w.numid, w.name_of_work
        , row_number() OVER (PARTITION BY rn) AS mat_nr  -- order undefined!
        , m_title AS Material_used 
   FROM  (
      SELECT w_alphaID AS alphaid, w_numID AS numid, w_title AS name_of_work
           , row_number() OVER (ORDER BY w_alphaID, w_numID) AS rn
       FROM  works
      ) w
   JOIN   medium m ON w.alphaid = m.m_alphaID 
                  AND w.numid   = m.m_numID
   ORDER  BY rn, mat_nr'
 , 'VALUES (1), (2), (3)'  -- add more ...
)
 AS ct (
    rn bigint, alphaid text, numid int, name_of_work text
  , material_1 text, material_2 text, material_3 text  -- add more ...
   );

Poor man's crosstab with standard SQL

If the additional module tablefunc cannot be installed or if top performance is not important, this simpler query does the same, slower:

SELECT w_alphaid AS alphaid, w_numid AS numid, w_title AS name_of_work
     , arr[1] AS material_used_1
     , arr[2] AS material_used_2
     , arr[3] AS material_used_3 -- add more?
FROM   works w
LEFT  JOIN (
   SELECT m_alphaid, m_numid, array_agg(m_title::text) AS arr
   FROM   medium
   GROUP  BY m_alphaid, m_numid
   ) m ON w.w_alphaid = m.m_alphaid 
      AND w.w_numid   = m.m_numid;
  • The cast to text (or varchar ...) is necessary because there is no predefined array type for your custom domain. Alternatively you could define the missing array type.

  • One subtle difference to the above: using LEFT JOIN here instead of just JOIN to preserve rows from works that have no related materials in medium at all.

  • Since you return the whole table, it's cheaper to aggregate rows in medium before you join. For a small selection it might be cheaper to join first and then aggregate. Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228