I'm using a foreign data wrapper to move a large chunk of data (with a very simple date transformation on one column) to a local database. Using a Django cursor (because I'm too lazy to pull out the credentials to create a raw psycopg2 cursor) I do this kind of query (anonymized and with a couple joins removed but otherwise identical to the original):
cursor.executemany(
sql.SQL(
"""
INSERT INTO local_table (
foreign_key_id,
other_foreign_key_id,
datetime,
comment
)
SELECT other_local_table.id,
%s,
(object_date + to_timestamp(object_time, 'HH24:MI')::time) at time zone '…',
comment
FROM imported_schema.remote_table
JOIN other_local_table ON other_local_table.code = remote_table.code
"""
),
[(dummy_id,)],
)
However, the local Postgres server always gets OOM killed after a while. I was expecting Postgres to flush the new rows to disk to avoid running out of memory, but as far as I can tell that's just not happening - /var/lib/docker/volumes/vagrant_postgres_data
only grows by a few MB while resident memory use grows into GBs. The local server does not have enough RAM to keep the entire result set in memory, so I need a solution which doesn't involve a more expensive hardware setup.
Do I need to set something like wal_sync_method
or work_mem
for this to work?
According to the docs executemany
should be the right tool for the job:
The function is mostly useful for commands that update the database: any result set returned by the query is discarded.
Running Postgres 10.6 containers on Linux on both servers and Django 2.1 locally. I'm not using any extensions other than FDW.
Explain plan:
Insert on local_table (cost=817872.44..818779.47 rows=25915 width=56)
-> Subquery Scan on "*SELECT*" (cost=817872.44..818779.47 rows=25915 width=56)
-> HashAggregate (cost=817872.44..818390.74 rows=25915 width=48)
Group Key: other_local_table.id, 1, timezone('…'::text, (remote_table.object_date + (to_timestamp((remote_table.object_time)::text, 'HH24:MI'::text))::time without time zone)), remote_table.comment
-> Nested Loop (cost=101.15..807974.88 rows=989756 width=48)
-> Nested Loop (cost=0.57..60.30 rows=73 width=12)
-> Nested Loop (cost=0.29..42.35 rows=38 width=4)
-> Seq Scan on fourth_local_table (cost=0.00..7.45 rows=1 width=4)
Filter: ((code)::text = '…'::text)
-> Index Scan using … on third_local_table (cost=0.29..34.49 rows=41 width=8)
Index Cond: (id = fourth_local_table.id)
-> Index Scan using … on other_local_table (cost=0.29..0.45 rows=2 width=16)
Index Cond: (id = third_local_table.id)
-> Foreign Scan on remote_table (cost=100.58..9421.44 rows=151030 width=20)
postgresqltuner suggests I
set vm.overcommit_memory=2 in /etc/sysctl.conf … This will disable memory overcommitment and avoid postgresql killed by OOM killer.
Is that the solution?