1

I have a query that should pivot a table with large number of values in a certain column that should be transformed into the column names of the pivoted table, i.e.,

   SELECT *
   FROM   crosstab(
   $$SELECT user_id, item_id, rating
   FROM   tbl
   ORDER  BY 1,2$$  
   ) AS ct ("user_id" varchar(50), "select distinct item_id from tbl ORDER BY 1" varchar(50)[]);

This attempt results in an error: return and sql tuple descriptions are incompatible. I am sure that the type of user_id and item_id columns is varchar(50). However, I am not sure what should be the type of the second argument that is a query.

This query has been generated following the example given in the first answer at PostgreSQL Crosstab Query. It mentions that the second argument can be a query.

Any help to fix this problem would be great. Thank you.

Edit:

For the purpose of providing MNE, the table tbl can be created by:

   CREATE TABLE tbl (
   user_id   varchar(50)
 , item_id   varchar(50)
 , rating    integer
);

INSERT INTO tbl VALUES 
  ('A', 'item_0', 1), ('A', 'item_3', 2)
, ('B', 'item_1', 4), ('B', 'item_2', 5)
                    , ('C', 'item_0', 3);

The desired outcome is:

user_id     item_0     item_1     item_2     item_3
A           1                                2
B                      4          5
C           3

Note that, in the actual scenario, the number of distinct values in the item_id column of table tbl is about 2000.

Marina
  • 330
  • 1
  • 6
  • 15

1 Answers1

2

crosstab is an overloaded function and it seems that the version you're after is the one that takes two string arguments: crosstab ( source_sql text, category_sql text ) → setof record.

So your query should look something like (I can't test it since your questions doesn't provide a sample input and output):

select * from crosstab(
  -- 1st query: source_sql
  'SELECT user_id, item_id, rating FROM tbl ORDER BY 1,2',

  -- 2nd query: category_sql
  'select distinct item_id from tbl ORDER BY 1'
) AS (user_id varchar(50), ...);

The two query strings are actually arguments to the crosstab call, not part of the specifier situated after the AS keyword.


Edit: Here's a version of the query that works on your sample data, but it also shows that the number of categories needs to be fixed:

select * from crosstab(
  'select * from tbl',
  'select distinct item_id from tbl order by item_id asc limit 4'
) as (
  user_id varchar(50),
  -- The number of items here must match the LIMIT clause above and you must
  -- make sure that the query will indeed return 4 item_id's.
  item_1 varchar(50),
  item_2 varchar(50),
  item_3 varchar(50),
  item_4 varchar(50)
);
Ionuț G. Stan
  • 176,118
  • 18
  • 189
  • 202
  • Thank you for your answer. I edited my post to include MNE. The proposed approach results in a different error: column "SELECT user_id, item_id, rating FROM tbl ORDER BY 1,2" does not exist. – Marina Mar 10 '23 at 17:12
  • @Marina there was a typo in my sample query — I used double quotes instead of single quotes for the SQL strings. In any case, now I better understand what you're after and it's not quite as straightforward because even if the 2nd query provides the category names dynamically, the list of columns after `AS` is fixed. You'd need a way to make even the `AS` fragment to be part of a string and then execute that string, which is possible inside a function: https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN – Ionuț G. Stan Mar 10 '23 at 20:05
  • @Marina, actually, I don't think there's a way to dynamically specify the set of columns based on a query. Postgres needs to know the set of columns in the result set. If you could add some context as to where you want to use this, than maybe there's a workaround at another place in the tech stack. – Ionuț G. Stan Mar 10 '23 at 20:19