Below is a function that's part of a process used to periodically upload many CSVs (that can change) into a Postgres 9.6 db. Anyone have any suggestions on how to improve this function or other data upload processes you'd like to share?
The function works (I think), so I thought I'd share in case it would be helpful for someone else. As a total newbie, this took me flipping forever, so hopefully I can save someone some time.
I lifted code from various sources+++ to make this function, which inserts all of the columns in the source table that have matching destination table columns, casting the data type from the source columns into the data type of the destination columns during the insert. I plan to turn this into a trigger function(s) that executes upon update of the source table(s).
Bigger picture: 1) batch file runs dbf2csv to export DBFs into CSVs, 2) batch files run csvkit to load many CSVs into a separate tables in a schema called dataloader and add a new column for the CSV date, 3) the below function moves the data from the dataloader table to the main tables located in the public schema. I had thought about using PGloader, but I don't know Python. An issue that I will have is if new columns are added to the source CSVs (this function will ignore them), but I can monitor that manually as the columns don't change much.
+++ A few I can remember (thanks!)
I experimented with FDWs and can't remember why I didn't use this approach.
CREATE OR REPLACE FUNCTION dataloader.insert_des3 (
tbl_des pg_catalog.regclass,
tbl_src pg_catalog.regclass
)
RETURNS void AS
$body$
DECLARE
tdes_cols text;
tsrc_cols text;
BEGIN
SET search_path TO dataloader, public;
SELECT string_agg( c1.attname, ',' ),
string_agg( quote_ident( COALESCE( c2.attname, 'NULL' ) ) || '::' || format_type(c1.atttypid, c1.atttypmod), ',' )
INTO tdes_cols,
tsrc_cols
FROM pg_attribute c1
LEFT JOIN pg_attribute c2
ON c2.attrelid = tbl_src
AND c2.attnum > 0 --attnum is the column number of c2
AND NOT c2.attisdropped
AND c1.attname = lower(c2.attname)
WHERE c1.attrelid = tbl_des
AND c1.attnum > 0
AND NOT c1.attisdropped
AND c1.attname <> 'id';
EXECUTE format(
' INSERT INTO %I (%s)
SELECT %s
FROM %I
',
tbl_des,
tdes_cols,
tsrc_cols,
tbl_src
);
END;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;
To call the function
SELECT dataloader.insert_des('public.tbl_des','dataloader.tbl_src')