PIVOT VIEW using PostgreSQL

Question

I'm new to PostgreSQL and am using version 9.4. I'm having a table with collected measurements as strings and need to convert it to a kind of PIVOT table using something which is always up-to-date, like a VIEW.
Furthermore, some values need to be converted, e. g. multiplied by 1000, as you can see in the example below for "sensor3".

Source Table:

CREATE TABLE source (
    id bigint NOT NULL,
    name character varying(255),
    "timestamp" timestamp without time zone,
    value character varying(32672),
    CONSTRAINT source_pkey PRIMARY KEY (id)
);

INSERT INTO source VALUES
  (15,'sensor2','2015-01-03 22:02:05.872','88.4')
, (16,'foo27'  ,'2015-01-03 22:02:10.887','-3.755')
, (17,'sensor1','2015-01-03 22:02:10.887','1.1704')
, (18,'foo27'  ,'2015-01-03 22:02:50.825','-1.4')
, (19,'bar_18' ,'2015-01-03 22:02:50.833','545.43')
, (20,'foo27'  ,'2015-01-03 22:02:50.935','-2.87')
, (21,'sensor3','2015-01-03 22:02:51.044','6.56');

Source Table Result:

| id | name      | timestamp                 | value    |
|----+-----------+---------------------------+----------|
| 15 | "sensor2" | "2015-01-03 22:02:05.872" | "88.4"   |
| 16 | "foo27"   | "2015-01-03 22:02:10.887" | "-3.755" |
| 17 | "sensor1" | "2015-01-03 22:02:10.887" | "1.1704" |
| 18 | "foo27"   | "2015-01-03 22:02:50.825" | "-1.4"   |
| 19 | "bar_18"  | "2015-01-03 22:02:50.833" | "545.43" |
| 20 | "foo27"   | "2015-01-03 22:02:50.935" | "-2.87"  |
| 21 | "sensor3" | "2015-01-03 22:02:51.044" | "6.56"   |

Desired Final Result:

| timestamp                 | sensor1 | sensor2 | sensor3 | foo27   | bar_18  |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" |         | 88.4    |         |         |         |
| "2015-01-03 22:02:10.887" | 1.1704  |         |         | -3.755  |         |
| "2015-01-03 22:02:50.825" |         |         |         | -1.4    |         |
| "2015-01-03 22:02:50.833" |         |         |         |         | 545.43  |
| "2015-01-03 22:02:50.935" |         |         |         | -2.87   |         |
| "2015-01-03 22:02:51.044" |         |         | 6560.00 |         |         |

Using this:

--    CREATE EXTENSION tablefunc;
SELECT *
    FROM
        crosstab(
            'SELECT
                source."timestamp",
                source.name,
                source.value
            FROM
                public.source
            ORDER BY
                1'
            ,
            'SELECT
                DISTINCT
                source.name
            FROM
                public.source
            ORDER BY
                1'
        )
    AS
        (
            "timestamp" timestamp without time zone,
            "sensor1" character varying(32672),
            "sensor2" character varying(32672),
            "sensor3" character varying(32672),
            "foo27" character varying(32672),
            "bar_18" character varying(32672)
        )
    ;

I got the result:

| timestamp                 | sensor1 | sensor2 | sensor3 | foo27   | bar_18  |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:02:05.872" |         |         |         | 88.4    |         |
| "2015-01-03 22:02:10.887" |         | -3.755  | 1.1704  |         |         |
| "2015-01-03 22:02:50.825" |         | -1.4    |         |         |         |
| "2015-01-03 22:02:50.833" | 545.43  |         |         |         |         |
| "2015-01-03 22:02:50.935" |         | -2.87   |         |         |         |
| "2015-01-03 22:02:51.044" |         |         |         |         | 6.56    |

Unfortunately,

the values aren't assigned to the correct column,
the columns aren't dynamic; that means the query fails when there is an additional entry in the name column like 'sensor4' and
I don't know how to change the values of some columns (multiply).

you need abother table that has matches the name to the source table and has the scale factor and desired column ordering, hten drop the 'as' clause. — Jasen, Jan 05 '15 at 00:43

score 5 · Accepted Answer · edited May 23 '17 at 12:28

Your query works like this:

SELECT * FROM crosstab(
  $$SELECT "timestamp", name
         , CASE name
           WHEN 'sensor3' THEN value::numeric * 1000
       --  WHEN 'sensor9' THEN value::numeric * 9000  -- add more ...
           ELSE value::numeric END AS value
    FROM   source
    ORDER  BY 1, 2$$
 ,$$SELECT unnest('{bar_18,foo27,sensor1,sensor2,sensor3}'::text[])$$
) AS (
  "timestamp" timestamp
, bar_18  numeric
, foo27   numeric
, sensor1 numeric
, sensor2 numeric
, sensor3 numeric);

To multiply the value for selected columns use a "simple" CASE statement. But you need to cast to a numeric type first. Using value::numeric in the example.
Which begs the question: Why not store value as numeric type to begin with?

You need to use the version with two parameters. Detailed explanation:

PostgreSQL Crosstab Query

Truly dynamic cross tabulation tables is next to impossible, since SQL demands to know the result type in advance - at call time at the latest. But you can do something with polymorphic types:

Dynamic alternative to pivot with CASE and GROUP BY

score 1 · Answer 2 · answered Jan 05 '15 at 22:21

@Erwin: It said "too long by 7128 characters" for a comment! Anyway:

Your post gave me the hints for the right direction, so thank you very much, but particularly in my case I need it be truly dynamic. Currently I've got 38886 rows with 49 different items (= columns to be pivoted).

To first answer yours and @Jasen's urgent question: The source table layout is not up to me, I'm already very happy to get this data into an RDBMS. If it were to me, I'd always save UTC-timestamps! But there's also a reason for having the data saved as strings: it may contain various data types, like boolean, integer, float, string etc.

To avoid confusing me further I created a new demo dataset, prefixing the data type (I know some hate this!) to avoid problems with keywords and changing the timestamp (--> minutes) for better overview:

--  --------------------------------------------------------------------------
--  Create demo table of given schema and insert arbitrary data
--  --------------------------------------------------------------------------

    DROP TABLE IF EXISTS table_source;

    CREATE TABLE table_source
    (
        column_id BIGINT NOT NULL,
        column_name CHARACTER VARYING(255),
        column_timestamp TIMESTAMP WITHOUT TIME ZONE,
        column_value CHARACTER VARYING(32672),
        CONSTRAINT table_source_pkey PRIMARY KEY (column_id)
    );

    INSERT INTO table_source VALUES ( 15,'sensor2','2015-01-03 22:01:05.872','88.4');
    INSERT INTO table_source VALUES ( 16,'foo27' ,'2015-01-03 22:02:10.887','-3.755');
    INSERT INTO table_source VALUES ( 17,'sensor1','2015-01-03 22:02:10.887','1.1704');
    INSERT INTO table_source VALUES ( 18,'foo27' ,'2015-01-03 22:03:50.825','-1.4');
    INSERT INTO table_source VALUES ( 19,'bar_18','2015-01-03 22:04:50.833','545.43');
    INSERT INTO table_source VALUES ( 20,'foo27' ,'2015-01-03 22:05:50.935','-2.87');
    INSERT INTO table_source VALUES ( 21,'seNSor3','2015-01-03 22:06:51.044','6.56');

    SELECT * FROM table_source;

Furthermore based on @Erwin's suggestions I created a view which already converts the data type. This has the nice feature, beside being fast, to only add required transformations for known items, but not impacting other (new) items.

--  --------------------------------------------------------------------------
--  Create view to process source data
--  --------------------------------------------------------------------------

    DROP VIEW IF EXISTS view_source_processed;

    CREATE VIEW
        view_source_processed
    AS
        SELECT
            column_timestamp,
            column_name,
            CASE LOWER( column_name)
                WHEN LOWER( 'sensor3') THEN CAST( column_value AS DOUBLE PRECISION) * 1000.0
                ELSE CAST( column_value AS DOUBLE PRECISION)
            END AS column_value
        FROM
            table_source
    ;

    SELECT * FROM view_source_processed ORDER BY column_timestamp DESC LIMIT 100;

This is the desired result of the whole question:

--  --------------------------------------------------------------------------
--  Desired result:
--  --------------------------------------------------------------------------

/*
| column_timestamp          | bar_18  | foo27   | sensor1 | sensor2 | seNSor3 |
|---------------------------+---------+---------+---------+---------+---------|
| "2015-01-03 22:01:05.872" |         |         |         |    88.4 |         |
| "2015-01-03 22:02:10.887" |         |  -3.755 |  1.1704 |         |         |
| "2015-01-03 22:03:50.825" |         |    -1.4 |         |         |         |
| "2015-01-03 22:04:50.833" |  545.43 |         |         |         |         |
| "2015-01-03 22:05:50.935" |         |   -2.87 |         |         |         |
| "2015-01-03 22:06:51.044" |         |         |         |         |    6560 |
*/

This is @Erwin's solution, adopted to the new demo source data. It's perfect, as long as the items (= columns to be pivoted) doesn't change:

--  --------------------------------------------------------------------------
--  Solution by Erwin, modified for changed demo dataset:
--  http://stackoverflow.com/a/27773730
--  --------------------------------------------------------------------------

SELECT *
    FROM
        crosstab(
            $$
                SELECT
                    column_timestamp,
                    column_name,
                    column_value
                FROM
                    view_source_processed
                ORDER BY
                    1, 2
            $$
        ,
            $$
                SELECT
                    UNNEST( '{bar_18,foo27,sensor1,sensor2,seNSor3}'::text[])
            $$
        )
    AS
        (
            column_timestamp timestamp,
            bar_18  DOUBLE PRECISION,
            foo27   DOUBLE PRECISION,
            sensor1 DOUBLE PRECISION,
            sensor2 DOUBLE PRECISION,
            seNSor3 DOUBLE PRECISION
        )
    ;

When reading through the links @Erwin provided, I found a Dynamic SQL example by @Clodoaldo Neto and remembered, that I had already done it this way in Transact-SQL; this is my attempt:

--  --------------------------------------------------------------------------
--  Dynamic attempt based on:
--  http://stackoverflow.com/a/12989297/131874
--  --------------------------------------------------------------------------

DO $DO$

DECLARE
    list_columns TEXT;

    BEGIN

        DROP TABLE IF EXISTS temp_table_pivot;

        list_columns := (
            SELECT
                string_agg( DISTINCT column_name, ' ' ORDER BY column_name)
            FROM
                view_source_processed
        );

        EXECUTE(
            FORMAT(
                $format_1$
                CREATE TEMP TABLE
                    temp_table_pivot(
                        column_timestamp TIMESTAMP,
                        %1$s
                    )
                $format_1$
            ,
                (
                    REPLACE(
                        list_columns,
                        ' ',
                        ' DOUBLE PRECISION, '
                    ) || ' DOUBLE PRECISION'
                )
            )
        );

        EXECUTE(
            FORMAT(
                $format_2$
                    INSERT INTO temp_table_pivot
                        SELECT
                            *
                        FROM crosstab(
                            $crosstab_1$
                            SELECT
                                column_timestamp,
                                column_name,
                                column_value
                            FROM
                                view_source_processed
                            ORDER BY
                                column_timestamp, column_name
                            $crosstab_1$
                        ,
                            $crosstab_2$
                            SELECT DISTINCT
                                column_name
                            FROM
                                view_source_processed
                            ORDER BY
                                column_name
                            $crosstab_2$
                        )
                        AS
                        (
                            column_timestamp TIMESTAMP,
                            %1$s
                        );
                $format_2$
            ,
                REPLACE( list_columns, ' ', ' DOUBLE PRECISION, ')
                ||
                ' DOUBLE PRECISION'
            )
        );

    END;

$DO$;

SELECT * FROM temp_table_pivot ORDER BY column_timestamp DESC LIMIT 100;

Beside getting this into a stored procedure, I will, for performance reasons, try to adopt this to an intermediate table where only new values are inserted. I'll keep you up-to-date!

Thanks!!!

L.

PS: NO, I don't want to answer my own question, but the "comment"-field is too small!

PIVOT VIEW using PostgreSQL

2 Answers2

Linked

Related