0

I have a large amount of tuples I want to insert into a SQL db. I would like to find a way to chunk the insert statements into specified sizes like so:

qry: str = 'insert into test_db (test_col) values ()'

Now my source data is in an iterable of say 10M different values. I'd like to chunk this statement where for every n (chunk_size) values, I create a string statement like:

# Source data for example (single column)
l: list[int] = [i for i in range(0,10000000)]
qry: str = 'insert into test_db (test_col) values (1), (2), (3), ..., (n)'

This way, I could execute the above statement in parallel via a thread pool or the like, where each insert into is executing a chunk_size of data into my sql db.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Coldchain9
  • 1,373
  • 11
  • 31
  • Does your database support parallel inserts? – smac89 Oct 08 '22 at 21:32
  • @smac89 yes, I'd be using something like Postgres or MSSQL. I want to parallelize my chunks of inserts in the most efficient way possible. – Coldchain9 Oct 08 '22 at 21:33
  • 2
    1) For Postgres using [psycopg2](https://www.psycopg.org/docs/) see [Fast execution helpers](https://www.psycopg.org/docs/extras.html#fast-execution-helpers) in particular `execute_batch` and `execute_values` 2) In general on Postgres you can make things faster using [Copy](https://www.postgresql.org/docs/current/sql-copy.html). – Adrian Klaver Oct 08 '22 at 21:39
  • If you want to break a list into chunks, I suggest https://stackoverflow.com/questions/434287/how-to-iterate-over-a-list-in-chunks If you want to break a generator into chunks, I suggest https://stackoverflow.com/questions/8991506/iterate-an-iterator-by-chunks-of-n-in-python – Nick ODell Oct 08 '22 at 21:50

0 Answers0