We have a table m1
with millions of records. We would like to produce a table m2
with calculations on every record in m1
.
We're currently running it as follows:
(jdbc/with-db-transaction [tx connection]
(jdbc/query tx
[(jdbc/prepare-statement (:connection tx)
"select * from m1"
{:fetch-size 1000})]
{:result-set-fn (process! [tx result-set] ...)}))
where process!
is
(defn process! [tx result-set]
(jdbc/with-db-transaction [tx tx]
(jdbc/insert-multi! tx :m2 [:m2_column]
(mapv (fn [r] [(calculate r)])
result-set))))
The select
query is using cursors and is consumed lazily. See: clojure.java.jdbc lazy query. That's the reason it is wrapped inside the outer transaction.
Questions:
- Is a (nested) transaction with millions of records problematic for Postgres? The clojure.java.jdbc docs say nested transactions are absorbed by the outer one, so effectively we only have one transaction. Is this correct?
- If we wanted to have separate transactions for the inserts, is the solution to use a different connection to the database? We already use connection pooling, so this might already be the case?