1

I have following Postgres query that runs over a large table and whenever it does, I observe a high CPU load that reaches 100% :

SELECT id, user_id, type
FROM table ss
WHERE end_timestamp > CURRENT_TIMESTAMP
AND (notification_channel1 = true OR notification_channel2 = true)
AND last_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'
ORDER BY user_id, type, title

Table schema:

CREATE TABLE saved_search (
    id BIGINT PRIMARY KEY,
    user_id varchar(40) NOT NULL,
    title varchar(500) NOT NULL,
    start_timestamp timestamp with time zone NOT NULL,
    end_timestamp timestamp with time zone NOT NULL,
    last_notification_timestamp timestamp with time zone NOT NULL,
    notification_channel1 boolean NOT NULL DEFAULT TRUE,
    notification_channel2 boolean NOT NULL DEFAULT TRUE,
);

I am thinking of using a covering index to speed up this query and hopefully avoid the cpu spike.

First question: would that be a valid path to follow?

The index I'm thinking of would be something like:

CREATE INDEX IX_COVERING 
ON table (end_date, notification_channel1, notification_channel2, last_notification_timestamp)
INCLUDE (id, user_id, type)

Would that be useful ? Would the INCLUDE be needed actually? Should I change the order of the columns in the index? Are there other / better approaches?

Gather Merge  (cost=387647.26..737625.73 rows=2999606 width=40) (actual time=36085.232..47805.119 rows=3634052 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=386647.24..390396.74 rows=1499803 width=40) (actual time=35820.825..40497.683 rows=1211351 loops=3)
"        Sort Key: user_id, type, title"
        Sort Method: external merge  Disk: 63640kB
        Worker 0:  Sort Method: external merge  Disk: 63944kB
        Worker 1:  Sort Method: external merge  Disk: 57896kB
        ->  Parallel Seq Scan on table  (cost=0.00..150768.88 rows=1499803 width=40) (actual time=0.200..4176.269 rows=1211351 loops=3)
              Filter: ((notification_channel1 OR notification_channel2) AND (end_timestamp > CURRENT_TIMESTAMP) AND (last_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'))
              Rows Removed by Filter: 136960
Planning Time: 3.632 ms
Execution Time: 48292.788 ms

With SET work_mem = '100MB';, here is the output I'm getting:

`Gather Merge  (cost=305621.26..655599.73 rows=2999606 width=40) (actual time=48856.376..55606.264 rows=3634097 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=304621.24..308370.74 rows=1499803 width=40) (actual time=46436.173..47563.732 rows=1211366 loops=3)
"        Sort Key: user_id, type, title"
        Sort Method: external merge  Disk: 72232kB
        Worker 0:  Sort Method: external merge  Disk: 55288kB
        Worker 1:  Sort Method: external merge  Disk: 57816kB
        ->  Parallel Seq Scan on table  (cost=0.00..150768.88 rows=1499803 width=40) (actual time=0.911..4643.228 rows=1211366 loops=3)
              Filter: ((notification_channel1 OR notification_channel2) AND (end_timestamp > CURRENT_TIMESTAMP) AND ((ast_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'))
              Rows Removed by Filter: 136960
Planning Time: 0.450 ms
Execution Time: 56035.995 ms
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ah_ben
  • 85
  • 7
  • Do you have the query plan? – Atmo Dec 29 '22 at 12:17
  • 1
    `last_notification_timestamp < CURRENT_TIMESTAMP`? Seems nonsensical? Also please: Your Postgres version, table definition, table size, cardinalities, what in the query is constant and what changes how? And the output of `EXPLAIN (ANALYZE, BUFFERS)`. – Erwin Brandstetter Dec 29 '22 at 12:47
  • That much is certain: The query involves `title`. While that's not in the index you won't get an index-only scan and the `INCLUDE` part is wasted. Not saying the index would make sense otherwise, just that it certainly doesn't as it stands. Order of index expressions also doesn't make sense. A partial index might be better. Depends on undisclosed information. – Erwin Brandstetter Dec 29 '22 at 12:53
  • Postgres 13. I updated the question (query and EXPLAIN ANALYZE output). hope it's clearer now. Thank you. – ah_ben Dec 29 '22 at 12:58
  • I have like 4 million rows in my table – ah_ben Dec 29 '22 at 13:01
  • `type` is missing in your table definition. – Erwin Brandstetter Dec 29 '22 at 13:22
  • You sort by `title`, which is defined as `varchar(500)`. If that's any indication, the column is rather big and makes the sort expensive. Do you need that sort step? Is something like `left(title, 15)` good enough? `user_id varchar(40)` seems wastefully big, too. Is that a UUID stored as string? – Erwin Brandstetter Dec 29 '22 at 13:25
  • `user_id` is something like : '95068145 , 401229705 , 407722349` etc – ah_ben Dec 29 '22 at 13:29
  • you're right. I think title sort can be avoided. – ah_ben Dec 29 '22 at 13:33
  • Increase whatever `work_mem` you had by 100MB. Once you see no more "Disk" in the query plan you have achieved the objective. – Erwin Brandstetter Dec 29 '22 at 13:33
  • Does this work_mem increase come with a (financial) cost to our infrastructure ? Should we scale up our DB instance in this case ? – ah_ben Dec 29 '22 at 13:37
  • Also, why perf seem to get worse by increasing work_mem ? – ah_ben Dec 29 '22 at 13:39
  • Sounds like you might need to hire a consultant to get your setup and DB schema straight. – Erwin Brandstetter Dec 29 '22 at 13:39
  • *"why perf seem to get worse by increasing work_mem"* - Did you actually increase `work_mem`? My initial advice of 100MB was not considering what you already had. (Also, going too high can actually hurt perf with limited resources as it starts competing with cache memory.) – Erwin Brandstetter Dec 29 '22 at 13:42
  • Your sort seems unreasonably slow. Was your server way overloaded when you did this? What is your collation? Is your tmp table space on very slow disks? – jjanes Dec 29 '22 at 16:36
  • @ErwinBrandstetter thanks for feedback. What is still not 100% clear to me is the following : am I reaching 100% CPU load because of the poor sorting perf or because my query is returning a LOT of records ? Or both ? My query does return around 2-3 million records !! – ah_ben Dec 29 '22 at 23:57
  • Can you replace the explain output with work_mem 100MB with one for 200MB (or however much is needed to make "Disk" disappear from the query plan)? So we can see the effect of sorting on disk vs. RAM. And how much RAM is available overall on your server? Also, compare each to the same query without `title` in `ORDER BY`. – Erwin Brandstetter Dec 30 '22 at 11:50

2 Answers2

2

You have "like 4 million rows" in the table and the query selects rows=3634052. That's 90 % of all rows.

Long story short: an index is not going to help (much). Probably not worth the added storage and maintenance cost.

More work_mem would help performance - as indicated by:

Sort Method: external merge Disk: ...

See:

You may not need more work_mem if you can optimize the ORDER BY.

user_id is something like : '95068145 , 401229705 , 407722349`

That column should be integer or bigint, most likely. Much more efficient for storage and sorting.

The big column title varchar(500) in your ORDER BY is probably expensive. Maybe something like left(title, 15) is good enough?

If you can bring down the size of ordering columns, an index (or partial index) on just those ordering columns may help some.


Either way, an upgrade to Postgres 15 will instantly improve performance. The release notes:

Performance improvements, particularly for in-memory and on-disk sorting.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Actually if I remove the sorting from my query, I think I have a better cost : ``` Seq Scan on table (cost=0.00..197945.92 rows=3599527 width=24) (actual time=0.051..4313.913 rows=3634071 loops=1) Filter: ((notification_channel1 OR notification_channel2) AND (end_date > CURRENT_TIMESTAMP) AND ((last_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days')) Rows Removed by Filter: 410881 Planning Time: 0.143 ms Execution Time: 4573.521 ms ``` – ah_ben Dec 29 '22 at 13:14
  • Try the query with `ORDER BY` again after `SET work_mem = '100MB'` in the same session ... – Erwin Brandstetter Dec 29 '22 at 13:16
  • I don't think it is better with work_mem. I updated original question (as outcome of explain is too big for a comment) – ah_ben Dec 29 '22 at 13:29
1

Your sort is incredibly slow. If I simulate data with those date types and the same row count and data volume as yours, my parallel seq scan is twice as slow but my sort is 10 times faster (although you don't list the "type" column in your CREATE, but do use it in the query, so I had to improvise). So maybe you are using a very slow collation, or there is something pretty sad about your hardware. Another thought, how long does it take if you do EXPLAIN (ANALYZE, TIMING OFF)? Maybe your system has slow access to the clock, and doing all those clock calls slows things down.

If you can't do anything about your sort being so slow, then you would want an index only scan which can avoid the sort. Which means the index must start with the columns being sorted on. The columns other than those ones can be in the INCLUDE, but there is generally no point in putting them there rather than the body. The non-sort columns can be listed in any order.

(user_id, type, title, end_timestamp, notification_channel1, notification_channel2, last_notification_timestamp, id)

If your supplied params of the channels never change, then a partial index might be slightly better:

 (user_id, type, title, end_timestamp, last_notification_timestamp, id) WHERE notification_channel1 = true OR notification_channel2 = true

You didn't increase your work_mem by enough to do the sorts in memory. The on-disk format is more compact than the in-memory format, so a sort that took 72,232kB of disk might take take ~250MB of memory to do it entirely in memory. Whether that comes with a financial cost we can't say, as we don't know how much RAM you currently have, or how much of that RAM typically goes unused, or how many concurrent sessions you expect to have. Increasing work_mem but not by enough to get rid of the external merge sort altogether can make the sorting slower (not dramatically, but noticeably) I think because it makes poorer use of the layers of caching above registers but below main memory (L1, L2, L3, etc.)

jjanes
  • 37,812
  • 5
  • 27
  • 34
  • Thanks for your feedback. Insightful indeed. What is still not 100% clear to me is the following : am I reaching 100% CPU load because of the poor sorting perf or because my query is returning a LOT of records ? Or both ? My query does return around 2-3 million records !! – ah_ben Dec 29 '22 at 23:55
  • It is the sort. EXPLAIN (ANALYZE) doesn't actually return the rows to the client, it throws them away a returns the plan in their place. Actually returning the rows could *also* be slow, but we have no way to assess that with the current info. – jjanes Dec 30 '22 at 05:34