3

I have a Postgres tabled called locations. It has a few million rows with data in the following format

 id |  location_a  |  location_b
----+--------------+--------------
 36 | Sydney       | London
 37 | Atlanta      | London
 38 | New York     | Tokyo
 39 | Tokyo        | Sydney
 40 | Tokyo        | Sydney
.....

I want to be able to generate a pivot table / count of the following form -

enter image description here

The issue is that the number of columns is variable, so it has to be programmatically / dynamically determined instead of using a static SELECT query.

I understand the basic concept of PL/pgSQL in that it is a scripting language that will let me do dynamic things like this.

But I'm having a lot of trouble getting started. Is there an easy way to compute the above?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user2490003
  • 10,706
  • 17
  • 79
  • 155
  • Look into using [cross tabs](https://stackoverflow.com/questions/28562244/invalid-count-and-sum-in-cross-tab-query-using-postgresql), but even in that case you would still have to list the city names in the query. – Tim Biegeleisen Sep 01 '17 at 04:00
  • There is no way in postgres to achieve your desired result as far as I know. If you choose function to create a dynamic table then again you have to set return type table and all the columns you want to return. But you do not want to do that as you need a dynamic column. Postgres is not like MySQL in terms of executing query and generate table according to that. – Fahad Anjum Sep 01 '17 at 09:51
  • If by *"construct a table with dynamic columns"* you mean `CREATE TABLE`, then it can be done. If you are aiming for a table-function returning a dynamic number of columns directly, then you are out of luck: that cannot be done. You could return aggregated data in an array or json etc., though. Related: https://stackoverflow.com/a/15514334/939860. Start by clarifying what you need *exactly*, disclose your version of Postgres and provide an exact definition of the source table (`CREATE TABLE ...`). – Erwin Brandstetter Sep 03 '17 at 00:56

1 Answers1

2

You can dynamically create a view. The idea and the solution for a simpler case than yours is described in this answer. Please read it before continuing.

We will use the following query to create a view:

with all_locations(location) as (
    select distinct location_a
    from locations
    union
    select distinct location_b
    from locations
)

select location_a as location, json_object_agg(location_b, count order by location_b) as data
from (
    select a.location as location_a, b.location as location_b, count(l.*)
    from all_locations a
    cross join all_locations b
    left join locations l on location_a = a.location and location_b = b.location
    group by 1, 2
    ) s
group by 1
order by 1;

Results:

 location |                                    data                                    
----------+----------------------------------------------------------------------------
 Atlanta  | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 London   | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 New York | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 1 }
 Sydney   | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 Tokyo    | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 2, "Tokyo" : 0 }
(5 rows)

The list of cities will be used twice inside the function, so it is stored in the array cities. Note, that you can replace the first query in the function to a simpler one (it is simply ordered list of distinct cities).

create or replace function create_locations_view()
returns void language plpgsql as $$
declare
    cities text[];
    list text;
begin
--  fill array with all cities in alphabetical order
    select array_agg(location_a order by location_a)
    from (
        select distinct location_a
        from locations
        union
        select distinct location_b
        from locations
        ) s
    into cities;

--  construct list of columns to use in select list
    select string_agg(format($s$data->>'%1$s' "%1$s"$s$, city), ', ')
    from unnest(cities) city
    into list;

--  create view from select based on the above list
    execute format($ex$
        drop view if exists locations_view;
        create view locations_view as 
        select location, %1$s
        from (
            select location_a as location, json_object_agg(location_b, count order by location_b) as data
            from (
                select a.location as location_a, b.location as location_b, count(l.*)
                from unnest(%2$L::text[]) a(location)
                cross join unnest(%2$L::text[]) b(location)
                left join locations l on location_a = a.location and location_b = b.location
                group by 1, 2
                ) s
            group by 1
        ) s
        order by 1
        $ex$, list, cities);
end $$;

Use the function and select data from the created view:

select create_locations_view();
select * from locations_view;

 location | Atlanta | London | New York | Sydney | Tokyo 
----------+---------+--------+----------+--------+-------
 Atlanta  | 0       | 1      | 0        | 0      | 0
 London   | 0       | 0      | 0        | 0      | 0
 New York | 0       | 0      | 0        | 0      | 1
 Sydney   | 0       | 1      | 0        | 0      | 0
 Tokyo    | 0       | 0      | 0        | 2      | 0
(5 rows)

I used this method several times but I have no experience with really large data, so I cannot guarantee that it is efficient.

klin
  • 112,967
  • 15
  • 204
  • 232
  • Thank you, this is a pretty creative solution! It's longer than I would have hoped for but it looks like there's not much support to do something like this in general. Appreciate the help. – user2490003 Sep 05 '17 at 22:03
  • 1
    Certain limitations and lack of ready-to-use tools stem from the foundations of database theory. Basically, such data transformations belong to the presentation layer and should be performed on the client side. Note that it is relatively easy to get the desired effect in your application by transforming the results of the first query (with the jsonb column). On the other hand, the function described in the answer really contains not much of code and is not particularly complicated, although requires some advanced knowledge. – klin Sep 05 '17 at 22:48