3

I have a table like this

INPUT

id    author    size    file_ext
--------------------------------
1     a         13661   python
1     a         13513   cpp
1     a         1211    non-code
2     b         1019    python
2     b         6881    cpp
2     b         1525    python
2     b         1208    non-code
3     c         1039    python 
3     c         1299    cpp

I want to be able to pivot this table in the following manner

OUTPUT

id    author    size    python    cpp    non-code
-------------------------------------------------
1     a         13661   1         0      0
1     a         13513   0         1      0
1     a         1211    0         0      1 
2     b         1019    1         0      0
2     b         6881    0         1      0
2     b         1525    1         0      0
2     b         1208    0         0      1
3     c         1039    1         0      0
3     c         1299    0         1      0

All the articles that I can find online pivot tables based on a second column. My ultimate goal is to get one records per ID.

FINAL OUTPUT

id    author    size    python    cpp    non-code
-------------------------------------------------
1     a         28385   1         1      1
2     b         10633   2         1      1
3     c         2338    1         1      0

Here the values of the size, python, cpp, non-code columns are aggregated.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
sm1994
  • 335
  • 3
  • 12

3 Answers3

4

With conditional aggregation:

select 
  id, author,
  sum(size) size,
  sum((file_ext = 'python')::int) python,
  sum((file_ext = 'cpp')::int) cpp,
  sum((file_ext = 'non-code')::int) "non-code"
from tablename
group by id, author

See the demo.
Results:

> id | author |  size | python | cpp | non-code
> -: | :----- | ----: | -----: | --: | -------:
>  1 | a      | 28385 |      1 |   1 |        1
>  2 | b      | 10633 |      2 |   1 |        1
>  3 | c      |  2338 |      1 |   1 |        0
forpas
  • 160,666
  • 10
  • 38
  • 76
  • You have all the possible file extensions hard coded into the query. Is there a way to do it if you don't know all the possible file extensions? – djsosofresh Jun 16 '21 at 22:41
  • 1
    @djsosofresh Postgresql does not support (at the moment as far as I know) dynamic pivoting. Check the link in Mohamad's answer for an extension function. – forpas Jun 17 '21 at 07:10
  • Thanks for the reply. It looks like the crosstab function mentioned in Mohomad's answer also requires the names and types of the output columns to be defined. – djsosofresh Jun 17 '21 at 22:22
3

While you want to avoid the crosstab() function, use the aggregate FILTER clause for best performance and clearest code:

SELECT id, author
     , sum(size) AS size
     , count(*) FILTER (WHERE file_ext = 'python')   AS python
     , count(*) FILTER (WHERE file_ext = 'cpp')      AS cpp
     , count(*) FILTER (WHERE file_ext = 'non-code') AS "non-code"
FROM   tablename
GROUP  BY id, author;

That's the fastest way with just aggregate functions. See:

For absolute best performance, crosstab() is typically faster - even though more verbose in this case:

SELECT id, author, size
     , COALESCE(python    , 0) AS python
     , COALESCE(cpp       , 0) AS cpp
     , COALESCE("non-code", 0) AS "non-code"
FROM   crosstab(
$$
SELECT id, author
     , sum(sum(size)) OVER (PARTITION BY id) AS size
     , file_ext
     , count(*) AS ct
FROM   tablename
GROUP  BY id, author, file_ext
ORDER  BY id, author, file_ext
$$
, $$VALUES ('python'), ('cpp'), ('non-code')$$
) AS (id int, author text, size numeric
    , python int, cpp int, "non-code" int);

Same result.

db<>fiddle here - with intermediary steps.

Detailed explanation:

For the window function over the aggregate function (sum(sum(size)) OVER (...)), see:

Note a subtle difference if there should be more than one author for the same id: while the first query returns multiple rows in this case, the crosstab() variant just picks the first author.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Most of the time when I have a question about Postgresql I stumble upon one of your high quality answers. This is highly appreciated. Thanks! – Ludovic Kuty Aug 11 '21 at 10:34
0

You can also use the PostgreSQL crosstab function extension called tablefunc. I advise you to take a look at this link (the example given, there, is very similar to the result you want):https://vertabelo.com/blog/creating-pivot-tables-in-postgresql-using-the-crosstab-function/

Mohamad TAGHLOBI
  • 581
  • 5
  • 11