3

I am having difficulties writing a Postgres function, as I am not familiar with it. I have multiple tables to import into Postgres with this format:

id | 1960 | 1961 | 1962 | 1963 | ...
____________________________________
 1    23     45     87     99
 2    12     31    ...

which I need to convert into this format:

id | year | value
_________________
 1   1960    23
 1   1961    45
 1   1962    87
 ...
 2   1960    12
 2   1961    31
 ...

I would imagine the function too to read like this:

SELECT all-years FROM imported_table;
CREATE a new_table;
FROM min-year TO max-year LOOP
     EXECUTE "INSERT INTO new_table (id, year, value) VALUES (id, year, value)";
END LOOP;

However, I'm having real trouble writing the nitty-gritty details for this. Would be easier for me to do that in PHP, but I am convinced that it's cleaner to do it directly in a Postgres-function.

The years (start and end) vary from table to table. And sometimes, I can even have years only for every fifth year or so ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
luftikus143
  • 1,285
  • 3
  • 27
  • 52
  • Version? `select version();` – Clodoaldo Neto Sep 02 '14 at 14:09
  • All columns `NOT NULL`? Also, column names starting with a digit are impossible without double-quoting. – Erwin Brandstetter Sep 02 '14 at 15:17
  • Thanks so much for the suggestions! I am running version 9.2.6. Important question: Isn't there any way to read the year columns in a more automated way? For once, there are fifty years which I would need to type then by hand, and secondly, the start and end year can vary from table to table. – luftikus143 Sep 02 '14 at 15:18

5 Answers5

4

PostgreSQL 9.3 offers as neat JSON functions which can be used for such tasks without defining new functions or knowing a number of columns.

SELECT id, (k).key as year, (k).value as value FROM
  (SELECT j->>'id' as id, json_each_text(j) as k
    FROM (
       SELECT row_to_json(tbl) as j FROM tbl) 
    as q)
    as r
WHERE (k).key <> 'id';

http://sqlfiddle.com/#!15/1714b/13

Jakub Kania
  • 15,665
  • 2
  • 37
  • 47
  • 1
    This is briljant: I have "periods" which need to be unpivoted. For some customers quarters (i.e. 4 periods), sometimes weeks (i.e. 52 or 53 weeks). And they need to go into 1 DWH. This will do the trick. – MZA Oct 24 '19 at 09:34
2

Parallel unnesting might be easier

select
    id,
    unnest(array[1960, 1961, 1962]) as year,
    unnest(array["1960", "1961", "1962"]) as value
from (values
    (1,23,45,87), (2,12,31,53)
) s(id, "1960", "1961", "1962")
;
 id | year | value 
----+------+-------
  1 | 1960 |    23
  1 | 1961 |    45
  1 | 1962 |    87
  2 | 1960 |    12
  2 | 1961 |    31
  2 | 1962 |    53
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
1

A completely dynamic version requires dynamic SQL. Use a plpgsql function with EXECUTE:

For Postgres 9.2 or older (before LATERAL was implemented):

CREATE OR REPLACE FUNCTION f_unpivot_years92(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE '
   SELECT id
        , unnest($1) AS year
        , unnest(ARRAY["'|| array_to_string(_years, '","') || '"]) AS val
   FROM   ' || _tbl || '
   ORDER  BY 1, 2'
   USING _years;
END
$func$  LANGUAGE plpgsql;

For Postgres 9.3 or later (with LATERAL):

CREATE OR REPLACE FUNCTION f_unpivot_years(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE (SELECT
     'SELECT t.id, u.year, u.val
      FROM  ' || _tbl || ' t
      LEFT   JOIN LATERAL (
         VALUES ' || string_agg(format('(%s, t.%I)', y, y), ', ')
     || ') u(year, val) ON true
      ORDER  BY 1, 2'
      FROM   unnest(_years) y
      );
END
$func$  LANGUAGE plpgsql;

About VARIADIC:

Call for arbitrary years:

SELECT * FROM f_unpivot_years('tbl', 1961, 1964, 1963);

Same, passing an actual array:

SELECT * FROM f_unpivot_years('tbl', VARIADIC '{1960,1961,1962,1963}'::int[]);

For a long list of sequential years:

SELECT * 
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2014)));

For a long list with regular intervals (example for every 5 years):

SELECT *
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2010,5)));

Output as requested.

The function takes:
1. A valid table name - double-quoted if it's otherwise illegal (like '"CaMeL"'). Using the object identifier type regclass to assert correctness and defend against SQL injection. You may want to schema-qualify the tale name to be unambiguous (like 'public."CaMeL"'). More:

2. Any list of numbers corresponding to (double-quoted) column names.
Or an actual array, prefixed with the keyword VARIADIC.

The array of columns does not have to be sorted in any way, but table and columns must exist or an exception is raised.

Output is sorted by id and year (as integer). If you want years to be sorted according to the sort order of the input array, make it just ORDER BY 1. Sort order according to array is not strictly guaranteed, but works in the current implementation. More about that:

Also works for NULL values.

SQL Fiddle for both with examples.

References:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks a lot for this. That looks great. However, although it runs correctly in the SQLFiddle, it creates an error warning in my database: "syntax error at or near "CREATE"". Not sure if I entered the fields correctly: NAME unpivot; RETURNS numeric; MODE inout; TYPE numeric... or what this is. Do you have any idea? – luftikus143 Sep 05 '14 at 05:43
  • @luftikus143: `RETURNS numeric`? Why? What is this? Just copy / paste my code and it should work. – Erwin Brandstetter Sep 05 '14 at 05:51
  • Ah, got it now. Sorry! Works perfectly! Thanks so much! – luftikus143 Sep 05 '14 at 09:16
0

The simplest way is a union all:

select id, 1960 as year, "1960" as value
from table t
union all
select id, '1960', "1961"
from table t
. . .;

A somewhat more sophisticated way would be:

select t.id, s.yr,
       (case when s.yr = 1960 then "1960"
             when s.yr = 1961 then "1961"
             . . .
        end) as value
from table t cross join
     generate_series(1960, 1980) s(yr);

You can put this into another table using insert or create table as.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Here's a way that's based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to this question, but uses the hstore extension

SELECT
  id,
  r.key AS year,
  r.value AS value
FROM
  imported_table t
CROSS JOIN
  each(hstore(t.*)) AS r(key, value)
WHERE
  -- This chooses columns that look like years
  -- In other cases you might need a different condition
  r.key ~ '^[0-9]{4}$'

It has a few benefits over other solutions:

  • By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
  • The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name (or even the value)
  • It doesn't require PL/pgSQL - it's all SQL
Michal Charemza
  • 25,940
  • 14
  • 98
  • 165