3

I have more than 10 million records in a table.

SELECT * FROM tbl ORDER BY datecol DESC
LIMIT 10
OFFSET 999990

Output of EXPLAIN ANALYZE on explain.depesz.com.
Executing the above query takes about 10 seconds. How can I make this faster?

Update

The execution time is reduced half by using a subquery:

SELECT * FROM tbl where id in 
(SELECT id FROM tbl ORDER BY datecol DESC LIMIT 10 OFFSET 999990)

Output of EXPLAIN ANALYZE on explain.depesz.com.

Community
  • 1
  • 1
Sumither S
  • 129
  • 1
  • 10

2 Answers2

2

You need to create an index on the column used in ORDER BY. Ideally in the same sort order, but PostgreSQL can scan indexes backwards at almost the same speed.

CREATE INDEX tbl_datecol_idx ON tbl (datecol DESC);

More about indexes and CREATE INDEX in the current manual.
Test with EXPLAIN ANALYZE to get actual times in addition to the query plan.

Of course all the usual advice for performance optimization applies, too.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

I was trying to do something similar my self with a very large table ( >100m records ) and found that using Offset / Limit was killing performance. Offset for the first 10m records was (with limit 1) about 1.5 minutes to retrieve with it growing exponentially. By record 50m I was up to 3 minutes per select - even using sub-queries.

I came across a post here which details useful alternatives.

I modified this slightly to suit my needs and came up with a method that gave me pretty quick results.

CREATE TEMPORARY TABLE 
just_index AS SELECT ROW_NUMBER() 
OVER (ORDER BY [VALUE-You-need]), [VALUE-You-need] 
FROM [your-table-name];

This was a once-off - took about 4 minutes but I then had all values I wanted Next was to create a function that would loop at the "Offset" I needed:

create or replace
function GetOffsets () 
returns void as $$ 
declare 
-- For this part of the function I only wanted values after 90 million up to 120 million
counter bigint := 90000000;
maxRows bigInt := 120000000;
begin
drop table if exists OffsetValues;
create temp table OffsetValues
(
    offset_myValue bigint
);

while counter <= maxRows loop 
    insert into OffsetValues(offset_myValue)
    select [VALUE-You-need] from just_index where row_number > counter
    limit 1;

    -- here I'm looping every 500,000 records - this is my 'Offset'
    counter := counter + 500000 ;
end loop ;
end ;$$ LANGUAGE plpgsql;

Then run the function:

select GetOffsets();

Again, a once-off amount of time (I went from ~3 minutes getting one of my offset values to 3 milliseconds to get one of my offset values). Then select from the temp-table:

select * from OffsetValues;

This worked really well for me in terms of performance - I don't think i'll be using offset going forward if I can help it.

Hope this improves performance for any of your larger tables.

John Youngtree
  • 401
  • 3
  • 10