5

we are using PostgREST in our project for some quite complex database views.

From some point on, when we are using limit and offset (x-range headers or query parameters) with sub-selects we get very high response times.

From what we have read, it seems like this is a known issue where postgresql executes the sub-selects even for the records which are not requested. The solution would be to jiggle a little with the offset and limit, putting it in a subselect or a CTE table.

Is there an internal GUC value or something similar that we can use in the database views in order to optimize the response times ? Does anybody have a hint on how to achieve this ?

EDIT: as suggested here are some more details. Let's say we have a relationship between product and parts. I want to know the parts count per product (this is a simplified version of the database views that we are exposing).

There are two ways of doing this

A. Subselect:

    SELECT products.id
            ,(
                    SELECT count(part_id) AS total
                    FROM parts
                    WHERE product_id = products.id
                    )
    FROM products limit 1000 OFFSET 99000

B. CTE:

    WITH parts_count
    AS (
            SELECT product_id
                    ,count(part_id) AS total
            FROM parts
            GROUP BY product_id
            ORDER BY product_id
            )
    SELECT products.id
            ,parts_count.total
    FROM products
    LEFT JOIN parts_count ON parts_count.product_id = product.id 
    LIMIT 1000     
    OFFSET 99000

Problem with A is that the sub-select is performed for every row so even though I read only 1000 records there are 100 000 subselects.

Problem with B is that the join with parts_count table takes very long since there are 100 0000 records there (although the with query takes only 200 ms! for 2000 records). Ideally I would like to limit the parts_count table with the same limit and offset as the main query but I can't do this in PostgREST since it just appends the limit and offset at the end, I don't have access to those parameters inside the WITH query

cip123
  • 63
  • 1
  • 6

1 Answers1

4

It is unavoidable that high OFFSET leads to bad performance.

There is no other way to compute OFFSET but to scan and discard all the rows until you reach the offset, and no database in the world will be fast if OFFSET is high.

That's a conceptual problem, and the only way to avoid it is to avoid OFFSET.

If your goal is pagination, then usually keyset pagination is a better solution:

You add an ORDER BY clause that matches your requirements, make sure there is a unique key in the ORDER BY clause and remember the last value you found. To fetch the next page, add a WHERE condition with that values. With proper index support, this can be very fast.

For your query, a more efficient version is probably:

SELECT p.id
       count(parts.part_id) AS total
FROM (SELECT id FROM products
      LIMIT 1000 OFFSET 99000) p
   LEFT JOIN parts ON parts.product_id = p.id
GROUP BY p.id;

It is rather weird that you have no ORDER BY, but LIMIT and OFFSET. That doesn't make much sense.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • The offset of the main query is not so much of a problem here, what is causing the performance loss is the sub-selects. I did a test where I used a CTE with the same offset as the main query and joined it with the main query afterwards, and the performance was acceptable. So I am trying to intercept the pagination of the outer query in the subselect. Something like the accepted answer here https://stackoverflow.com/questions/29307991/postgres-why-is-the-performance-so-bad-on-subselects-with-offset-limit – cip123 May 30 '18 at 05:46
  • Ok. If you need more information, you'll have to add more details to your question. – Laurenz Albe May 30 '18 at 06:26
  • I have updated the original question with more details – cip123 May 30 '18 at 09:55
  • I have added an improved query to the answer. – Laurenz Albe May 30 '18 at 16:18
  • This is just a simplified version of my query, I have an order by in the original. I am a little dubious it will work since like I said PostgREST just add the `limit` and `offset` at the end of the query but I will try it tomorrow and let you know how it works. – cip123 May 30 '18 at 17:22
  • A framework that does not allow you to use custom SQL is lame. – Laurenz Albe May 31 '18 at 04:01