0

I am trying to insert multiple records in a single database call with the psycopg2.extras.execute_values(cursor, statement, argument_list) function. It works when I have a list with only strings or integers as field values. But I want the id field to be assigned from a postgres sequence. I tried using nextval('name_of_sequence) as a value, but it is treated as a string, and thus not valid for my id column, which is a begint.

Bjinse
  • 1,339
  • 12
  • 25
  • 1
    Is the `id` field a `serial/identity` field? If so just leave it out of the insert column list. – Adrian Klaver Apr 07 '23 at 15:00
  • No, the id field is not defined as type SERIAL, it is just an integer, so I cannot leave it out of the insert. (I now see the type. Instead of begint, I of course meant bigint. Dutch autocorrect kicking in...) – Bjinse Apr 11 '23 at 14:04

1 Answers1

1

Two ways to deal with this.

1) Do this:

alter <the_table> alter column id set default nextval('name_of_sequence').

Then you won't have specify the id value in the query. If the sequence is dedicated to that table.column then also:

alter sequence name_of_sequence owned by <table_name>.id;

That will create a dependency so that if the table is dropped the sequence will be also.

2)

Leave the sequence freestanding and just pull the values as needed using procedures below.

SQL schema

create table py_seq_test(id bigint, fld_1 varchar);
create sequence py_seq;

Python code

a) Regular execute

import psycopg2

con = psycopg2.connect("dbname=test host=localhost  user=postgres")
cur = con.cursor()

cur.execute("insert into py_seq_test values(nextval('py_seq'), %s)", ['test1'])
cur.execute("insert into py_seq_test values(nextval('py_seq'), '%s')", ['test2'])
con.commit()
cur.execute("select * from py_seq_test")
cur.fetchall()

[(1, 'test'), (2, 'test2')]

b) execute_values

from psycopg2.extras import execute_values

execute_values(cur, 
   "insert into py_seq_test values %s", 
   [('test3',), ('test4',) ], 
   template="(nextval('py_seq'),  %s)")

con.commit()

cur.execute("select * from py_seq_test")
cur.fetchall()

[(1, 'test'), (2, 'test2'), (3, 'test3'), (4, 'test4']


UPDATE

Example of how execute() compares to execute_values() over inserting 10000 values. This is done in ipython using the %%time cell magic.

large_list = [(val,) for val in range(10000)]

%%time
for i_val in large_list:
    cur.execute("insert into py_seq_test values(nextval('py_seq'), %s)", [i_val[0]])
con.commit()

CPU times: user 157 ms, sys: 103 ms, total: 260 ms
Wall time: 790 ms

%%time
execute_values(cur, 
   "insert into py_seq_test values %s", 
   large_list, 
   template="(nextval('py_seq'),  %s)")
con.commit()

CPU times: user 30.6 ms, sys: 320 µs, total: 30.9 ms
Wall time: 164 ms

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28
  • I already implemented the Python a) variant. This worked for me but when building a generic version it needed special attention to handle None/NULL values and strings with single quotes in them. I will investigate if the Python b) version does better fit my needs. It seems the better way of doing this, using the template. Thanks! – Bjinse Apr 12 '23 at 15:45
  • I will rewrite the a) version to use parameters like the b) version and that should handle any issues with None and strings as the [parameter passing](https://www.psycopg.org/docs/usage.html#passing-parameters-to-sql-queries) process will take care of type adaptation and string escaping. – Adrian Klaver Apr 12 '23 at 15:54
  • In https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query?rq=1 I just read that there is a significant performance difference that leads me to keep using method a). But of course creating one big insert statement with multiple inserts in 1 go. – Bjinse Apr 12 '23 at 16:01
  • The performance issue is with `executemany()` as noted here [Fast execution helpers](https://www.psycopg.org/docs/extras.html#fast-execution-helpers): *The current implementation of executemany() is (using an extremely charitable understatement) not particularly performing.* `execute_batch()` and `execute_values()` where created to solve that issue. Trust me if you have a lot of rows to enter `execute_values()` will outperform iterating over `execute()`. For bulk loading [Copy](https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from) will be even faster. – Adrian Klaver Apr 12 '23 at 16:05
  • See **UPDATE** for timing of `execute()` vs `execute_values()`. – Adrian Klaver Apr 12 '23 at 18:05
  • execute() with a single insert statement with a list of records is probably just as fast? – Bjinse Apr 14 '23 at 10:41
  • Plus the time it takes to build the statement with the correct number of placeholders. `execute_values()` takes care of that for you. – Adrian Klaver Apr 14 '23 at 15:13