Why is PostgreSQL not saving to disk when doing long-running INSERT … SELECT …?

Question

I'm using a foreign data wrapper to move a large chunk of data (with a very simple date transformation on one column) to a local database. Using a Django cursor (because I'm too lazy to pull out the credentials to create a raw psycopg2 cursor) I do this kind of query (anonymized and with a couple joins removed but otherwise identical to the original):

cursor.executemany(
    sql.SQL(
        """
        INSERT INTO local_table (
            foreign_key_id,
            other_foreign_key_id,
            datetime,
            comment
        )
        SELECT other_local_table.id,
               %s,
               (object_date + to_timestamp(object_time, 'HH24:MI')::time) at time zone '…',
               comment
          FROM imported_schema.remote_table
          JOIN other_local_table ON other_local_table.code = remote_table.code
        """
    ),
    [(dummy_id,)],
)

However, the local Postgres server always gets OOM killed after a while. I was expecting Postgres to flush the new rows to disk to avoid running out of memory, but as far as I can tell that's just not happening - /var/lib/docker/volumes/vagrant_postgres_data only grows by a few MB while resident memory use grows into GBs. The local server does not have enough RAM to keep the entire result set in memory, so I need a solution which doesn't involve a more expensive hardware setup.

Do I need to set something like wal_sync_method or work_mem for this to work?

According to the docs executemany should be the right tool for the job:

The function is mostly useful for commands that update the database: any result set returned by the query is discarded.

Running Postgres 10.6 containers on Linux on both servers and Django 2.1 locally. I'm not using any extensions other than FDW.

Explain plan:

Insert on local_table  (cost=817872.44..818779.47 rows=25915 width=56)
  ->  Subquery Scan on "*SELECT*"  (cost=817872.44..818779.47 rows=25915 width=56)
        ->  HashAggregate  (cost=817872.44..818390.74 rows=25915 width=48)
              Group Key: other_local_table.id, 1, timezone('…'::text, (remote_table.object_date + (to_timestamp((remote_table.object_time)::text, 'HH24:MI'::text))::time without time zone)), remote_table.comment
              ->  Nested Loop  (cost=101.15..807974.88 rows=989756 width=48)
                    ->  Nested Loop  (cost=0.57..60.30 rows=73 width=12)
                          ->  Nested Loop  (cost=0.29..42.35 rows=38 width=4)
                                ->  Seq Scan on fourth_local_table  (cost=0.00..7.45 rows=1 width=4)
                                      Filter: ((code)::text = '…'::text)
                                ->  Index Scan using … on third_local_table  (cost=0.29..34.49 rows=41 width=8)
                                      Index Cond: (id = fourth_local_table.id)
                          ->  Index Scan using … on other_local_table  (cost=0.29..0.45 rows=2 width=16)
                                Index Cond: (id = third_local_table.id)
                    ->  Foreign Scan on remote_table  (cost=100.58..9421.44 rows=151030 width=20)

postgresqltuner suggests I

set vm.overcommit_memory=2 in /etc/sysctl.conf … This will disable memory overcommitment and avoid postgresql killed by OOM killer.

Is that the solution?

@LaurenzAlbe I've added it, even though I don't understand why that's relevant :) — l0b0, Dec 04 '18 at 20:47
My solution in these cases is to pick some critera to chunk the data on so you're not dealing with quite as many rows at once. — Donnie, Dec 04 '18 at 20:50
@Donnie I'd like to avoid that because it will be a fairly significant amount of work to split things into manageable chunks *and* the overall run time will suffer. — l0b0, Dec 04 '18 at 20:51

score 1 · Answer 1 · answered Dec 04 '18 at 21:05

I don't see anything in your execution plan except the HashAggregate that could consume any amount of memory, and that should be limited by work_mem.

To diagnose this, you should first configure your system so that you get a regular OOM error instead of invoking the OOM killer. That means to set vm.overcommit_memory = 2 with sysctl and adjusting vm_overcommit_ratio to 100 * (RAM - swap) / RAM.

When the server gets an OOM error, it will dump the current memory contexts and their size to the PostgreSQL log. This should give an indication where the memory goes. Add it to the question in case of doubt.

Are you using any third-party extensions?

Do you recommend adding swap as well? Because I don't have any yet. — l0b0, Dec 04 '18 at 22:58

Why is PostgreSQL not saving to disk when doing long-running INSERT … SELECT …?

1 Answers1