1

I am writing python scripts to sychronize tables from a MSSQL database to a Postgresql DB. The original author tends to use super wide tables with a lot of regional consecutive NULL holes in them.

For insertion speed, I serialized the records in bulk to string in the following form before execute()

INSERT INTO A( {col_list} ) 
SELECT * FROM ( VALUES (row_1), (row_2),...) B( {col_list} )

During the row serialization, its not possbile to determin the data type of NULL or None in python. This makes the job complicated. All NULL values in timestamp columns, integer columns etc need explicit type cast into proper types, or Pg complains about it.

Currently I am checking the DB API connection.description property and compare column type_code, for every column and add type casting like ::timestamp as needed.

But this feels cumbersome, with the extra work: the driver already converted the data from text to proper python data type, now I have to redo it for column with those many Nones.

Is there any better way to work around this with elegancy & simplicity ?

Ben
  • 1,133
  • 1
  • 15
  • 30
  • 1
    Do you really need the `SELECT`? In the example you've given, `INSERT INTO A (...) VALUES ...` should be enough. I think Postgres is less fussy about unknown parameter types when there's no query involved. – Nick Barnes Feb 28 '18 at 12:36

3 Answers3

2

If you don't need the SELECT, go with @Nick's answer.
If you need it (like with a CTE to use the input rows multiple times), there are workarounds depending on the details of your use case.

Example, when working with complete rows:

INSERT INTO A -- complete rows
SELECT * FROM (
   VALUES ((NULL::A).*), (row_1), (row_2), ...
   ) B
OFFSET 1;

{col_list} is optional noise in this particular case, since we need to provide complete rows anyway.

Detailed explanation:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for the technique. I didn't use CTE. But this rowtype casting looks very helpful. I wish I knew it earlier. – Ben Mar 01 '18 at 00:16
1

You could try to create json from data and then rowset from json using json_populate_record(..).

postgres=# create table js_test (id int4, dat timestamp, val text);
CREATE TABLE

postgres=# insert into js_test
postgres-# select (json_populate_record(null::js_test,
postgres(# json_object(array['id', 'dat', 'val'], array['5', null, 'test']))).*;
INSERT 0 1

postgres=# select * from js_test;
 id | dat | val
----+-----+------
  5 |     | test

You can use json_populate_recordset(..) to do the same with multiple rows in one go. You just pass json value that is array of json. Make sure it isn't array of json.

So this is OK: '[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]'::json

This is not: array['{"id":1,"dat":null,"val":6}'::json,'{"id":3,"val":"tst"}'::json]

select *
from json_populate_recordset(null::js_test,
                             '[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]')
Łukasz Kamiński
  • 5,630
  • 1
  • 19
  • 32
  • Hi Lukasz, It looks good. but with this json function, I can't figure out how to insert multiple rows in one go without repeating the column names per row. – Ben Feb 28 '18 at 11:45
  • @Ben I have updated my answer to include function that will produce rowset in one call: `json_populate_recordset(..)` – Łukasz Kamiński Feb 28 '18 at 13:06
  • Thanks, but these repeated column names seem redundent, These tables in my case have dozen columns, each priodically with many nulls . the overhead seems a bit high. – Ben Mar 01 '18 at 00:10
1

Instead of inserting from a SELECT, you can attach a VALUES clause directly to the INSERT, i.e.:

INSERT INTO A ({col_list}) 
VALUES (row_1), (row_2), ...

When you insert from a query, Postgres examines the query in isolation when trying to infer the column types, and then tries to coerce them to match the target table (only to find out that it can't).

When you insert directly from a VALUES list, it knows about the target table when performing the type inference, and can then assume that any untyped NULL matches the corresponding column.

Nick Barnes
  • 19,816
  • 3
  • 51
  • 63