5

I'm trying to get data out of a table for a survey in a particular format. However all my attempts seems to hand the DB because of too many joins/too heavy on the DB.

My data looks like this:

id, user, question_id, answer_id, 
1,   1,   1,           1
3,   1,   3,           15
4,   2,   1,           2
5,   2,   2,           12
6,   2,   3,           20

There are roughly 250,000 rows and each user has about 30 rows. I want the result to look like:

user0, q1, q2,   q3 
1,     1,  NULL, 15
2,     2,  12,   20 

So that each user has one row in the result, each with a separate column for each answer.

I'm using Postgres but answers in any SQL language would be appreciated as I could translate to Postgres.

EDIT: I also need to be able to deal with users not answering questions, i.e. in the example above q2 for user 1.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Yule
  • 9,668
  • 3
  • 51
  • 72

3 Answers3

6

Consider the following demo:

CREATE TEMP TABLE qa (id int, usr int, question_id int, answer_id int);
INSERT INTO qa VALUES
 (1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);

SELECT *
FROM   crosstab('
    SELECT usr::text
          ,question_id
          ,answer_id
    FROM qa
    ORDER BY 1,2')
 AS ct (
     usr text
    ,q1 int
    ,q2 int
    ,q3 int);

Result:

 usr | q1 | q2 | q3
-----+----+----+----
 1   |  1 |  9 | 15
 2   |  2 | 12 | 20
(2 rows)

user is a reserved word. Don't use it as column name! I renamed it to usr.

You need to install the additional module tablefunc which provides the function crosstab(). Note that this operation is strictly per database. In PostgreSQL 9.1 you can simply:

CREATE EXTENSION tablefunc;

For older version you would execute a shell-script supplied in your contrib directory. In Debian, for PostgreSQL 8.4, that would be:

psql mydb -f /usr/share/postgresql/8.4/contrib/tablefunc.sql
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This is where I went initially, but I'm struggling to cope with missing data, edited question. So if a user doesn't have a row for question 2, it should still output a null (or 0). Have an upvote for effort anyway :) – Yule Dec 13 '11 at 17:03
  • Crosstab syntax can be a bit challenging...you'll need a row returned for each answer for each user in order for it to populate correctly. Inside the quoted area in the cross tab report, build a table (subquery can handle it..something like from user inner join questions on 1=1. Remember the order by!). Then left join this back to your data to populate the values. Otherwise missing answers for a user will offset the fields incorrectly. – Twelfth Dec 13 '11 at 19:02
  • 1
    @Twelfth: There is the variant with two parameters [`crosstab(text, text)`](http://www.postgresql.org/docs/current/interactive/tablefunc.html#AEN138286) that takes care of missing categories by providing an explicit list of the categories in the second parameter. More details in [this recent answer](http://stackoverflow.com/questions/11074489/postgres-buckets-always-filled-from-left-in-crosstab-query/11075727#11075727). – Erwin Brandstetter Jun 20 '12 at 18:55
  • @ErwinBrandstetter: Thanks, that actually simplifies this a lot. – Twelfth Jun 21 '12 at 17:03
3

Erwins answer is good, until missing answer for a user shows up. I'm going to make an assumption on you....you have a users table that has one row per user and you have a questions table that has one row per questions.

select usr, question_id
from users u inner join questions q on 1=1
order by 1,

This statement will create a row for every user/question, and be in the same order. Turn it into a subquery and left join it to your data...

select usr,question_id,qa.answer_id
from
(select usr, question_id
from users u inner join questions q on 1=1
)a
left join qa on qa.usr = a.usr and qa.question_id = a.usr
order by 1,2

Plug that into Erwins crosstab statement and give him credit for the answer :P

Twelfth
  • 7,070
  • 3
  • 26
  • 34
  • Just to add...you will need to define the crosstab results for each field returned. If you have 3 answers, AS ct ( usr text ,q1 int ,q2 int ,q3 int); works. If you have 30, be prepared to define each one like this. If you have a new question added, the crosstab select statement will pick it up, but you will need to ensure you add the field to this ct() list – Twelfth Dec 13 '11 at 19:15
1

I implemented a truly dynamic function to handle this problem without having to hard code any specific number of questions or use external modules/extensions. It also much simpler to use than crosstab().

You can find it here: https://github.com/jumpstarter-io/colpivot

Example that solves this particular problem:

begin;

create temp table qa (id int, usr int, question_id int, answer_id int);
insert into qa values
 (1,1,1,1)
,(2,1,2,9)
,(3,1,3,15)
,(4,2,1,2)
,(5,2,2,12)
,(6,2,3,20);

select colpivot('_output', $$
    select usr, ('q' || question_id::text) question_id, answer_id from qa
$$, array['usr'], array['question_id'], '#.answer_id', null);

select * from _output;

rollback;

Result:

 usr | 'q1' | 'q2' | 'q3' 
-----+------+------+------
   1 |    1 |    9 |   15
   2 |    2 |   12 |   20
(2 rows)
Hannes Landeholm
  • 1,525
  • 2
  • 17
  • 32