How to generate a list of DDL objects or functions that are deployed to or missing from different PostgreSQL databases

Question

I have been tasked with providing a "query" to a teammate who want an easy method of validating that all objects have been deployed to QA or Production PostgreSQL database servers.

Specifically the requester wants to be able to confirm that all tables, functions, sequences, indexes, views, schemas, and triggers exist in each environment and/or which ones are missing from one environment or another when new applications are deployed.

So far I have found queries for identifying indexes and the columns/column order they are in (Position of the column in the index), can query tables in information_schema for other objects, and have learned about tools for generating full diffs of the schemas (How to check difference between two databases in PostgreSQL?). The useful tool apgdiff gives SQL output to synchronize databases, but I did not see how to get a summary of objects that were different and the requester just wants a spreadsheet of what is or isn't there so that missing objects can be deployed. Once all objects are verified to be in existence apgdiff or other tool can be used to further examine that the full table definitions and function & trigger code are identical, but that will be a later task.

My initial attempt for this, without indexes or function parameters is a UNION query to be run separately on each environment, but code to run this against all environments and combine the results is desired, too.

select routine_name as object_name, routine_schema as schema_name, 'routine' as object_type, 'yes' as object_exists from information_schema.routines where routine_schema in ( 'shema1','schema2','schema3' ) union 
select schema_name as object_name, schema_name as schema_name, 'schema' as object_type, 'yes' as object_exists from information_schema.schemata where schema_name in ( 'shema1','schema2','schema3' ) union 
select sequence_name as object_name, sequence_schema as schema_name, 'sequence' as object_type, 'yes' as object_exists from information_schema.sequences where sequence_schema in ( 'shema1','schema2','schema3' ) union 
select table_name as object_name, table_schema as schema_name, 'table' as object_type, 'yes' as object_exists from information_schema.tables where table_schema in ( 'shema1','schema2','schema3' ) union 
select trigger_name as object_name, trigger_schema as schema_name, 'trigger' as object_type, 'yes' as object_exists from information_schema.triggers where trigger_schema in ( 'shema1','schema2','schema3' ) union
select table_name as object_name, table_schema as schema_name, 'view' as object_type, 'yes' as object_exists from information_schema.views where table_schema in ( 'shema1','schema2','schema3' ) 
order by object_type, schema_name, object_name;

Rather then "diffing" schemas, you should use a proper schema migration tool. Then the tool would know what to apply. Have a look a Liquibase or Flyway — , Jan 05 '17 at 22:23
@a_horse_with_no_name, thank you for the tip. I had seen reference to Liquibase, but did not know about Flyway. I agree with you and hope to move the entire application environment that direction, but there are differences that are known to exist and members of the team that want to see an overview of what those differences are to coordinate upcoming deployments. — Dan, Jan 05 '17 at 22:34

Dan · Answer 1 · 2017-01-05T22:29:03.703

After a few revisions this is what I ended up with. It uses the dblink extension (required as a pre-requisite, and in this case installed in a schema named "extension_data") to query three different environments from one session in pgAdmin, psql, etc. The output shows the object name, object type, and which of the three environments have the object loaded. In the cases of indexes I included the table name, type (USING clause) and ordered column list rather than the name since those are the important parts of the index and system generated names could theoretically be different. I also included a column for "object_info" which contains CREATE INDEX commands for indexes that don't exists for easy access and the description of the various pg_settings settings that are also queried.

do $$
-- Two changes are needed below: 
--      (1) the schema list needs to be updated
--      (2) the dblink connection information for each environment needs to be updated
-- Execute this in pgadmin while connected to one of the environments.  If the username and 
-- password are the same in all three environments then they do not need to be listed in 
-- the dblink connection strings below
declare
    rc bigint := 0;
    r RECORD;
    -- Necessary change #1: put the schemas you care about here
    v_schema_list text := '''schema1'',''schema2'',''schema3'',''schema4''';
    -- This is a large query looking at database settings, functions (routines), schemas, sequences, tables, triggers, views, and indexes
    -- For functions, the parameter list is important since the same function name can exist with a different number of parameters 
    --      or the parameters in a different order.  This query is not looking at parameter defaults.
    -- For indexes, the name of the index isn't important to a functioning system, but the type of index and column order is
    -- For tables, this only looks for the existence of the table, not the column order nor constraints, but those could be added later
    v_sql text := 'select name||'' = ''||setting as object_name, ''setting'' as object_type, category||'': ''||short_desc as object_info 
                          from pg_settings 
                   union
                   select routine_schema||''.''||routine_name||''(''||coalesce(string_agg(parameters.parameter_name||'' ''||parameters.data_type,'', ''),'''')||'')'' as object_name
                        , ''routine'' as object_type
                        , ''''::text as object_info
                        from information_schema.routines 
                        left join lateral
                            (select parameter_name, parameters.data_type, parameters.ordinal_position
                            from information_schema.parameters 
                            where parameters.specific_schema = routines.specific_schema
                            and parameters.specific_name = routines.specific_name
                            order by ordinal_position) parameters
                            on true
                        where routine_schema in ('||v_schema_list||') 
                        group by routine_name, routine_schema
                   union 
                   select schema_name||''.''||schema_name as object_name, ''schema'' as object_type, ''''::text as object_info 
                        from information_schema.schemata where schema_name in ('||v_schema_list||') 
                   union 
                   select sequence_schema||''.''||sequence_name as object_name, ''sequence'' as object_type, ''''::text as object_info 
                        from information_schema.sequences where sequence_schema in ('||v_schema_list||') 
                   union 
                   select table_schema||''.''||table_name as object_name, ''table'' as object_type, ''''::text as object_info 
                        from information_schema.tables where table_schema in ('||v_schema_list||') 
                   union 
                   select trigger_schema||''.''||trigger_name as object_name, ''trigger'' as object_type, ''''::text as object_info 
                        from information_schema.triggers where trigger_schema in ('||v_schema_list||') 
                   union
                   select table_schema||''.''||table_name as object_name, ''view'' as object_type, ''''::text as object_info 
                        from information_schema.views where table_schema in ('||v_schema_list||') 
                   union 
                   select substring(indexdef,3+position(''ON'' in indexdef)) as object_name, ''index'' as object_type, indexdef as object_info 
                        from pg_indexes where schemaname in ('||v_schema_list||')
                   order by object_type, object_name; ';
begin
    drop table if exists object_list;
    drop table if exists object_comparison;
    create temp table object_list (object_name text, object_type text, object_info text, environment text);

    for r in 
        -- Necessary change #2: update connection information for each database here
        select 'prod' as conn, 'dbname=your_prod_db_name port=5432 host=your_prod_server username=your_prod_user password=your_prod_password' as conn_string union
        select 'qa' as conn, 'dbname=your_qa_db_name port=5432 host=your_qa_server username=your_qa_user password=your_qa_password' as conn_string union 
        select 'dev' as conn, 'dbname=your_dev_db_name port=5432 host=your_dev_server username=your_dev_user password=your_dev_password' as conn_string
    loop 
        begin
            perform extension_data.dblink_disconnect(r.conn);    
        exception when others then 
            null;
        end;
        perform extension_data.dblink_connect(r.conn, r.conn_string);

        perform extension_data.dblink_open(r.conn, 'object_list',v_sql);
        GET CURRENT DIAGNOSTICS rc := ROW_COUNT;

        while rc > 0 loop 
            insert into object_list 
                SELECT *, r.conn as environment 
                FROM extension_data.dblink_fetch(r.conn, 'object_list', 500) AS (object_name text, object_type text, object_info text);
            GET CURRENT DIAGNOSTICS rc := ROW_COUNT;
        end loop;

        perform extension_data.dblink_close(r.conn, 'object_list');

        perform extension_data.dblink_disconnect(r.conn);    
    end loop;

    create temp table object_comparison as (
        select coalesce(dev.object_name,coalesce(qa.object_name,prod.object_name)) as object_name
             , coalesce(dev.object_type,coalesce(qa.object_type,prod.object_type)) as object_type
             , dev.environment as dev
             , qa.environment as qa
             , prod.environment as prod
             , coalesce(prod.object_info,coalesce(qa.object_info,dev.object_info)) as object_info
        from (select * from object_list where environment = 'dev')  dev
        full outer join (select * from object_list where environment = 'qa')  qa 
            on dev.object_name = qa.object_name 
            and dev.object_type = qa.object_type 
        full outer join (select * from object_list where environment = 'prod')  prod 
            on coalesce(dev.object_name,qa.object_name) = prod.object_name 
            and coalesce(dev.object_type,qa.object_type) = prod.object_type
    );
end;
$$ language plpgsql;

select * from object_comparison where dev is null or qa is null or prod is null;

How to generate a list of DDL objects or functions that are deployed to or missing from different PostgreSQL databases

1 Answers1