40

I have an interesting conundrum which I believe can be solved in purely SQL. I have tables similar to the following:

responses:

user_id | question_id | body
----------------------------
1       | 1           | Yes
2       | 1           | Yes
1       | 2           | Yes
2       | 2           | No
1       | 3           | No
2       | 3           | No


questions:

id | body
-------------------------
1 | Do you like apples?
2 | Do you like oranges?
3 | Do you like carrots?

and I would like to get the following output

user_id | Do you like apples? | Do you like oranges? | Do you like carrots?
---------------------------------------------------------------------------
1       | Yes                 | Yes                  | No
2       | Yes                 | No                   | No

I don't know how many questions there will be, and they will be dynamic, so I can't just code for every question. I am using PostgreSQL and I believe this is called transposition, but I can't seem to find anything that says the standard way of doing this in SQL. I remember doing this in my database class back in college, but it was in MySQL and I honestly don't remember how we did it.

I'm assuming it will be a combination of joins and a GROUP BY statement, but I can't even figure out how to start.

Anybody know how to do this? Thanks very much!

Edit 1: I found some information about using a crosstab which seems to be what I want, but I'm having trouble making sense of it. Links to better articles would be greatly appreciated!

Mahmoud Gamal
  • 78,257
  • 17
  • 139
  • 164
Topher Fangio
  • 20,372
  • 15
  • 61
  • 94

5 Answers5

52

Use:

  SELECT r.user_id,
         MAX(CASE WHEN r.question_id = 1 THEN r.body ELSE NULL END) AS "Do you like apples?",
         MAX(CASE WHEN r.question_id = 2 THEN r.body ELSE NULL END) AS "Do you like oranges?",
         MAX(CASE WHEN r.question_id = 3 THEN r.body ELSE NULL END) AS "Do you like carrots?"
    FROM RESPONSES r
    JOIN QUESTIONS q ON q.id = r.question_id
GROUP BY r.user_id

This is a standard pivot query, because you are "pivoting" the data from rows to columnar data.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • 4
    So you're saying I have to build a dynamic query based on the number of questions that I have? I guess I could do that, but I was hoping for a more simple solution. – Topher Fangio Jan 20 '10 at 05:28
  • @Topher: Oracle & SQL Server have `PIVOT` and `UNPIVOT`, but if you check the pivot tag you'll see that dynamic queries are common even with the function. – OMG Ponies Jan 20 '10 at 05:33
  • 1
    Thanks for the answer. Looks like this will be the easiest to implement even if I have to generate the query at runtime. – Topher Fangio Jan 20 '10 at 05:40
  • This query doesn't work in PostgreSQL. The single quotes have to be double quotes. – Peter Eisentraut Jan 20 '10 at 07:15
  • What if you have a case when there is a user who hasn't answered any questions (neither yes nor no) and I just want to add that user with null values for question columns? – Ahmadov Nov 06 '13 at 13:21
  • 1
    what if you have dynamic number of rows (not 3). How would you transpose then? – UserBSS1 Jul 02 '14 at 10:27
  • See my answer below for a completely dynamic solution. – Hannes Landeholm Oct 19 '15 at 01:07
13

I implemented a truly dynamic function to handle this problem without having to hard code any specific class of answers or use external modules/extensions. It also gives full control over column ordering and supports multiple key and class/attribute columns.

You can find it here: https://github.com/jumpstarter-io/colpivot

Example that solves this particular problem:

begin;

create temporary table responses (
    user_id integer,
    question_id integer,
    body text
) on commit drop;

create temporary table questions (
    id integer,
    body text
) on commit drop;

insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');

select colpivot('_output', $$
    select r.user_id, q.body q, r.body a from responses r
        join questions q on q.id = r.question_id
$$, array['user_id'], array['q'], '#.a', null);

select * from _output;

rollback;

This outputs:

 user_id | 'Do you like apples?' | 'Do you like carrots?' | 'Do you like oranges?' 
---------+-----------------------+------------------------+------------------------
       1 | Yes                   | No                     | Yes
       2 | Yes                   | No                     | No
Hannes Landeholm
  • 1,525
  • 2
  • 17
  • 32
  • 1
    Very nice! Thanks for sharing and for making it open source! Would love to see some benchmarks about it's performance though (particularly here on SO as new searchers will want to be confident in it's capabilities). – Topher Fangio Oct 19 '15 at 17:12
  • 1
    how can I get rid of the quote? in the column names – Diego Jun 23 '17 at 04:20
6

You can solve this example with the crosstab function in this way

drop table if exists responses;
create table responses (
user_id integer,
question_id integer,
body text
);

drop table if exists questions;
create table questions (
id integer,
body text
);

insert into responses values (1,1,'Yes'), (2,1,'Yes'), (1,2,'Yes'), (2,2,'No'), (1,3,'No'), (2,3,'No');
insert into questions values (1, 'Do you like apples?'), (2, 'Do you like oranges?'), (3, 'Do you like carrots?');

select * from crosstab('select responses.user_id, questions.body, responses.body from responses, questions where questions.id = responses.question_id order by user_id') as ct(userid integer, "Do you like apples?" text, "Do you like oranges?" text, "Do you like carrots?" text);

First, you must install tablefunc extension. Since 9.1 version you can do it using create extension:

CREATE EXTENSION tablefunc;
Francisco Puga
  • 23,869
  • 5
  • 48
  • 64
4

I wrote a function to generate the dynamic query. It generates the sql for the crosstab and creates a view (drops it first if it exists). You can than select from the view to get your results.

Here is the function:

CREATE OR REPLACE FUNCTION public.c_crosstab (
  eavsql_inarg varchar,
  resview varchar,
  rowid varchar,
  colid varchar,
  val varchar,
  agr varchar
)
RETURNS void AS
$body$
DECLARE
    casesql varchar;
    dynsql varchar;    
    r record;
BEGIN   
 dynsql='';

 for r in 
      select * from pg_views where lower(viewname) = lower(resview)
  loop
      execute 'DROP VIEW ' || resview;
  end loop;   

 casesql='SELECT DISTINCT ' || colid || ' AS v from (' || eavsql_inarg || ') eav ORDER BY ' || colid;
 FOR r IN EXECUTE casesql Loop
    dynsql = dynsql || ', ' || agr || '(CASE WHEN ' || colid || '=''' || r.v || ''' THEN ' || val || ' ELSE NULL END) AS ' || agr || '_' || r.v;
 END LOOP;
 dynsql = 'CREATE VIEW ' || resview || ' AS SELECT ' || rowid || dynsql || ' from (' || eavsql_inarg || ') eav GROUP BY ' || rowid;
 RAISE NOTICE 'dynsql %1', dynsql; 
 EXECUTE dynsql;
END

$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;

And here is how I use it:

SELECT c_crosstab('query_txt', 'view_name', 'entity_column_name', 'attribute_column_name', 'value_column_name', 'first');

Example: Fist you run:

SELECT c_crosstab('Select * from table', 'ct_view', 'usr_id', 'question_id', 'response_value', 'first');

Than:

Select * from ct_view;
SunWuKung
  • 527
  • 4
  • 16
-2

There is an example of this in contrib/tablefunc/.

Peter Eisentraut
  • 35,221
  • 12
  • 85
  • 90
  • 1
    Umm, where is `contrib/tablefunc`? Are you talking about a directory on the doc server? – Topher Fangio Jan 20 '10 at 05:24
  • It's in that directory in the source tree, or you might find a `postgresql-contrib` binary package that you need to install that contains it. – Peter Eisentraut Jan 20 '10 at 07:13
  • 1
    Downvoted for poor quality answer. You haven't provided any context, haven't provided a section of the source material (in case it changes) and haven't made any effort to relate it to the question. Refer to this for better answers http://stackoverflow.com/help/how-to-answer – coagmano Dec 05 '15 at 02:52