Global row numbers in chunked query

Question

I would like to include a column row_number in my result set with the row number sequence, where 1 is the newest item, without gaps. This works:

SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10;

Now I would like to query for the same data in chunks of 1000 each to be easier on memory:

SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;

Here the row_number restarts from 1 for every chunk, but I would like it to be as if it were part of the global query, as in the first case. Is there an easy way to accomplish this?

Erwin Brandstetter · Accepted Answer · 2023-01-24T14:29:47.197

Assuming:

id is defined as PRIMARY KEY - which means UNIQUE and NOT NULL. Else you may have to deal with NULL values and / or duplicates (ties).
You have no concurrent write access on the table - or you don't care what happens after you have taken your snapshot.

A MATERIALIZED VIEW, like you demonstrate in your answer, is a good choice.

CREATE MATERIALIZED VIEW mv_temp AS
SELECT row_number() OVER (ORDER BY id DESC) AS rn, id, title
FROM   mytable
WHERE  group_id = 10;

But index and subsequent queries must be on the row number rn to get

data in chunks of 1000

CREATE INDEX ON mv_temp (rn);

SELECT * FROM mv_temp WHERE rn BETWEEN 1000 AND 2000;

Your implementation would require a guaranteed gap-less id column - which would void the need for an added row number to begin with ...

When done:

DROP MATERIALIZED VIEW mv_temp;

The index dies with the table (materialized view in this case) automatically.

Related, with more details:

Optimize query with OFFSET on large table

Haha I was going to accept my own answer, but you rewrote it with more care and detail, interesting situation. For my use case it doesn't matter whether the index/querying is done on id or rn as I use this for batch exporting data. I will accept your answer so it goes up faster, as I think the other answers are not suitable for big tables. — tdma, May 02 '16 at 21:40

score 1 · Answer 2 · answered May 02 '16 at 06:32

1

You want to have a query for the first 1000 rows, then one for the next 1000, and so on?

Usually you just write one query (the one you already use), have your app fetch 1000 records, do something with them, then fetch the next 1000 and so on. No need for separate queries, hence.

However, it would be rather easy to write such partial queries:

select *
from
(
  SELECT id, row_number() over (ORDER BY id desc) AS rn, title
  FROM mytable
  WHERE group_id = 10
) numbered
where rn between 1 and 1000; -- <- simply change the row number range here
                             --    e.g. where rn between 1001 and 2000 for the second chunk

answered May 02 '16 at 06:32

Thorsten Kettner

89,309
7
49
73

I understand the concept of db-side cursors as you suggest, but unfortunately sometimes it's not possible to use them. Your answer works, but is very slow on big tables, due to the final "between x and y" filter that is not indexed: I discovered that pgsql does sequential filtering over the full subquery to get the final pagination, which takes up to 30 seconds on my table :( – tdma May 02 '16 at 06:52
However you write the query, the DBMS will have to sort the whole table in order to know what rows 5001 to 6000 are for instance and then move to that chunk. I don't see a real alternative. (Madhivanan's suggestion to use `LIMIT` and `OFFSET` is not a bad idea, but should result in about the same thing; order the records in a subquery, move to the chunk, then give row_numbers to these rows.) – Thorsten Kettner May 02 '16 at 07:03
1

OFFSET is not suitable to paginate over millions of rows. – tdma May 02 '16 at 08:08

score 1 · Answer 3 · answered May 02 '16 at 06:37

1

You need a pagination. Try this

SELECT id, row_number() over (ORDER BY id desc)+0 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;

Next time, when you change the start value of id in the WHERE clause change it in row_number() as well like below

SELECT id, row_number() over (ORDER BY id desc)+1000 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 1000 AND id < 2000
ORDER BY id ASC;

or Better you can use OFFSET and LIMIT approach for pagination https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf

answered May 02 '16 at 06:37

Madhivanan

13,470
1
24
29

But then, doing this +1000, you assume that each chunk/page has exactly 1000 results, which may not be the case, due to gaps, or maybe some rows not satisfying group_id=10, right? – tdma May 02 '16 at 06:54
Then look at the link I posted – Madhivanan May 02 '16 at 11:43

score 0 · Answer 4 · answered May 02 '16 at 07:53

In the end I ended up doing it this way:

First I create a temporary materialized view:

CREATE MATERIALIZED VIEW vw_temp AS SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10;

Then I define the index:

CREATE INDEX idx_temp ON vw_temp USING btree(id);

Now I can perform all operations very quickly, and with numbered rows:

SELECT * FROM vw_temp WHERE id BETWEEN 1000 AND 2000;

After doing the operations, cleanup:

DROP INDEX idx_temp;
DROP MATERIALIZED VIEW vw_temp;

Even though Thorsten Kettner's answer seems the cleanest one, it was not practical for me due to being too slow. Thanks for contributing everyone. For those interesed in the practical use case, I use this for feeding data to the Sphinx indexer.

Global row numbers in chunked query

4 Answers4