0

Below is a function that's part of a process used to periodically upload many CSVs (that can change) into a Postgres 9.6 db. Anyone have any suggestions on how to improve this function or other data upload processes you'd like to share?

The function works (I think), so I thought I'd share in case it would be helpful for someone else. As a total newbie, this took me flipping forever, so hopefully I can save someone some time.

I lifted code from various sources+++ to make this function, which inserts all of the columns in the source table that have matching destination table columns, casting the data type from the source columns into the data type of the destination columns during the insert. I plan to turn this into a trigger function(s) that executes upon update of the source table(s).

Bigger picture: 1) batch file runs dbf2csv to export DBFs into CSVs, 2) batch files run csvkit to load many CSVs into a separate tables in a schema called dataloader and add a new column for the CSV date, 3) the below function moves the data from the dataloader table to the main tables located in the public schema. I had thought about using PGloader, but I don't know Python. An issue that I will have is if new columns are added to the source CSVs (this function will ignore them), but I can monitor that manually as the columns don't change much.

+++ A few I can remember (thanks!)

Dynamic insert

Dynamic insert #2

Data type

More dynamic code

I experimented with FDWs and can't remember why I didn't use this approach.

Foreign data wrapper

CREATE OR REPLACE FUNCTION dataloader.insert_des3 (
  tbl_des pg_catalog.regclass,
  tbl_src pg_catalog.regclass
)
RETURNS void AS
$body$
DECLARE
    tdes_cols text;
    tsrc_cols text;

BEGIN
    SET search_path TO dataloader, public;
    SELECT  string_agg( c1.attname, ',' ), 
            string_agg( quote_ident( COALESCE( c2.attname, 'NULL' ) ) || '::' || format_type(c1.atttypid, c1.atttypmod), ',' ) 
    INTO    tdes_cols,
            tsrc_cols
    FROM    pg_attribute c1 
    LEFT JOIN    pg_attribute c2 
    ON      c2.attrelid = tbl_src  
    AND     c2.attnum > 0 --attnum is the column number of c2
    AND     NOT c2.attisdropped 
    AND     c1.attname = lower(c2.attname) 
    WHERE   c1.attrelid = tbl_des
    AND     c1.attnum > 0
    AND     NOT c1.attisdropped
    AND     c1.attname <> 'id';

    EXECUTE format(
        '   INSERT INTO %I (%s)
            SELECT %s
            FROM %I
        ',
        tbl_des,  
        tdes_cols, 
        tsrc_cols, 
        tbl_src 
    );

END;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;

To call the function
SELECT dataloader.insert_des('public.tbl_des','dataloader.tbl_src')

J Wells
  • 1
  • 1

0 Answers0