Pivot table with multiple value columns

Question

I have a Postgres table with products data from different manufacturers, here the simplified table structure:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

I need to output each Manufacturer for each sku but if there are several identical manufacturers for the same sku I need to select the Manufacturer with the lowest price (note that I also need to include 'stock' column), here desired results:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |

I tried to use Postgres crosstab():

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );

But this produces a table with only one price column. And I didn't find a way to include the stock column.

I also tried to use conditional aggregation:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku

And this query also doesn't work in my case - it simply selects min stock level - but if there are few the same Manufacturers for the same sku but with different prices/stocks - this query select min price from one manufacturer and min stock from another.

How can I output each manufacturer's price and corresponding stock from that table?

P.S. Thank you all for such helpful answers. My Postgres table is rather small - there no more than 15k of products, (I don't know if such numbers can be useful for proper comparing) but since Erwin Brandstetter asked to compare different queries performance I ran 3 queries with EXPLAIN ANALYZE, here is their execution time:

Erwin Brandstetter query:        400 - 450 ms 
Kjetil S query:                  250 - 300 ms
Gordon Linoff query:             200 - 250 ms
a_horse_with_no_name query:      250 - 300 ms

Again - I'm not sure if those numbers can be useful as a reference. For my case, I chose the combined version of Kjetil S and Gordon Linoff queries but Erwin Brandstetter and a_horse_with_no_name variants are also very useful and interesting. It's worth noting that if my table in the future would end up having more then few Manufacturers - adjusting query and typing their names each time would be tiresome - and hence the query from a_horse_with_no_name answer would be the most convenient to use.

Your expected output doesn't match the "*I need to select a Manufacturer with min 'price'*" requirement - you are showing all manufactures as columns, not the one with the smallest price. — , Mar 22 '19 at 09:33
Oh, I'm sorry - I'll fix it, overall I need to output each Manufacturer for each sku but if there are duplicate Manufacturer for the same sku I need to select the record with the lowest price. — Vlad, Mar 22 '19 at 09:39
That will be very complicated (not to say impossible) because it means that the number of columns can change for each SKU e.g. one SKU where all manufacturers have the same price would result in one column only, if there are three but two have the same price, then it would be two columns and so on. — , Mar 22 '19 at 09:41
How and where are you using that result? This would be a lot easier if the price/stock information could be returned as e.g. a single JSON column. — , Mar 22 '19 at 09:42
This table is very simplified version - final results should include not only price and stock but also few other columns. The output later sends to JS framework (Ag-Grid that output data as a pivot table). I think that outputting data as Json column can work in this case - can you show a sample? — Vlad, Mar 22 '19 at 09:53

Kjetil S. · Accepted Answer · 2019-03-22T10:00:59.857

Your last select almost works. But you should add a where condition where rows with non-minimum prices per sku per manufacturer are removed. This produces your expected result:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;

Thank you - this query produces output exactly what I need. – Vlad Mar 22 '19 at 10:19 — Vlad, Mar 22 '19 at 10:19

score 1 · Answer 2 · answered Mar 22 '19 at 09:56

I find using a JSON result much easier these days then using a complicated pivot. Producing a single aggregated JSON value doesn't break the inherent restriction of SQL that the number of columns must be known before the query is executed (and must be the same for all rows).

You could using something like this:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

The above returns the following result using your sample data:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}

I chose another answer since it produces a plain table in the required format, but I really like your suggestion of using Json for such queries. — Vlad, Mar 22 '19 at 10:19
@Vlad: the advantage of the JSON is, that you don't need to adjust your query if you get more manufacturers. — , Mar 22 '19 at 10:22

score 1 · Answer 3 · answered Mar 22 '19 at 11:24

I would use distinct on to limit the data to one manufacturer to one price. And I like the filter functionality in Postgres. So:

select sku,
       max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
      from test_table t
      order by manufacturer_name, sku, price
     ) t
group by sku
order by sku;

Erwin Brandstetter · Answer 4 · 2019-03-22T13:31:48.327

crosstab() must provide a static column definition list. Your 2nd parameter:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

... provides a dynamic list of values that would require a dynamic column definition list. That's not going to work - except by incidence.

The core problem of your task is that crosstab() expects a single value column from the query in its first parameter. But you want to process two value columns per row (price and stock).

One way around this is to pack multiple values in a composite type and extract values in the outer SELECT.

Create a composite type once:

CREATE TYPE price_stock AS (price float8, stock int);

A temporary table or view also serves the purpose.
Then:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 price_stock
            , man2 price_stock
            , man3 price_stock
    );

For a quick test, or if the row of your underlying table isn't too wide, you can also just use its row type, without creating a custom type:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 test_table
            , man2 test_table
            , man3 test_table
    );

db<>fiddle here

@Vlad: If your table is big I would be interested how this performs as compared to Kjetil's query. Could you run both with `EXPLAIN ANALYZE`, or with `\timing` in psql? — Erwin Brandstetter, Mar 22 '19 at 13:13
Thank you @Erwin Brandstetter for another great variant with helpful explanations. My table is small but nonetheless - I added comparing results to question. — Vlad, Mar 23 '19 at 09:40
Thanks for providing results. Typically, `crosstab()` is fastest. The overhead of forming a composite type and unnesting values back is weighing in, though. — Erwin Brandstetter, Mar 23 '19 at 12:00

Pivot table with multiple value columns

4 Answers4