19

Following is the query that I use for getting a fixed number of records from a database with millions of records:-

select * from myTable LIMIT 100 OFFSET 0

What I observed is, if the offset is very high like say 90000, then it takes more time for the query to execute. Following is the time difference between 2 queries with different offsets:

select * from myTable LIMIT 100 OFFSET 0       //Execution Time is less than 1sec
select * from myTable LIMIT 100 OFFSET 95000   //Execution Time is almost 15secs

Can anyone suggest me how to optimize this query? I mean, the Query Execution Time should be same and fast for any number of records I wish to retrieve from any OFFSET.

Newly Added:- The actual scenario is that I have got a database having > than 1 million records. But since it's an embedded device, I just can't do "select * from myTable" and then fetch all the records from the query. My device crashes. Instead what I do is I keep fetching records batch by batch (batch size = 100 or 1000 records) as per the query mentioned above. But as i mentioned, it becomes slow as the offset increases. So, my ultimate aim is that I want to read all the records from the database. But since I can't fetch all the records in a single execution, I need some other efficient way to achieve this.

5 Answers5

10

As JvdBerg said, indexes are not used in LIMIT/OFFSET. Simply adding 'ORDER BY indexed_field' will not help too.

To speed up pagination you should avoid LIMIT/OFFSET and use WHERE clause instead. For example, if your primary key field is named 'id' and has no gaps, than your code above can be rewritten like this:

SELECT * FROM myTable WHERE id>=0     AND id<100     //very fast!
SELECT * FROM myTable WHERE id>=95000 AND id<95100   //as fast as previous line!
user318750
  • 326
  • 3
  • 7
  • It's sort of a workaround, but with the gaps in ROWID in my current database, "id<95100" will return about the 50000th record, which is a significantly different result. – tekHedd Sep 13 '22 at 18:50
9

As @user318750 said, if you know you have a contiguous index, you can simply use

select * from Table where index >= %start and index < %(start+size)

However, those cases are rare. If you don't want to rely on that assumption, use a sub-query, for example using rowid, which is always indexed,

select * from Table where rowid in (
  select rowid from Table limit %size offset %start)

This speeds things up especially if you have "fat" rows (e.g. that contain blobs).

If maintaining the record order is important (it usually isn't), you need to order the indices first:

select * from Table where rowid in (
  select rowid from Table order by rowid limit %size offset %start)
P-Gn
  • 23,115
  • 9
  • 87
  • 104
  • I tried this out and was very surprised to find that it worked! The subquery here `SELECT rowid FROM Table LIMIT 30 OFFSET 100000` is indeed quick on my test database. But this seems to an artefact of something internal and very fragile: Running `EXPLAIN QUERY PLAN` I found it was using a seemingly unrelated compound index, and went back to a slow full table scan when I removed this index even though the index on `rowid` still existed (as it always does). Putting `ORDER BY rowid` into the inner query also caused a full table scan, even though that's presumably the order it was already using. – Arthur Tacca Feb 03 '20 at 13:14
  • Most bizarrely, changing `SELECT rowid` to `SELECT *` on the inner query (in which case you don't need the outer query any more) causes the query to use a full table scan, and slow down noticably, even though I would expect a change like that to not change which index is selected. Overall, I think it's much safer to replace `LIMIT` and `OFFSET` with a `WHERE` clause on an incrementing counter column that is indexed, as user318750 suggested in their answer. – Arthur Tacca Feb 03 '20 at 13:14
  • @ArthurTacca The starting point of my answer is precisely to get rid of the assumption that one has a contiguous counter. In my experience, virtually no application ever ensures the presence of a countiguous counter. The fact that ids are not reused after being deleted is usually seen as a desirable feature. – P-Gn Feb 17 '20 at 14:12
7

By doing a query with a offset of 95000, all previous 95000 records are processed. You should make some index on the table, and use that for selecting records.

JvdBerg
  • 21,777
  • 8
  • 38
  • 55
  • Sorry, i didn't understand. By making index means what I should do exactly? – Satya Prakash Panigrahi Sep 04 '12 at 15:03
  • 2
    CREATE A INDEX on the field that is important to you, and then make shure that the field then is used in the query: 'select * from myTable order by index_field LIMIT 100 OFFSET 95000' SQLite will then process the index instead of the table, witch is much faster! – JvdBerg Sep 04 '12 at 15:05
  • Oh, ok ok. See, my database has 6 columns among which one is "INTEGER PRIMARY KEY". As far as i know, indexes are automatically created for columns declared as Primary Key. So, for "order by index_field", is it OK if i mention the Primary Key column name in place of "index_field" without Manually Creating Index for that field? – Satya Prakash Panigrahi Sep 04 '12 at 15:14
  • Yes! You should be fine then! – JvdBerg Sep 04 '12 at 15:16
  • Thanks. Actually I never used Indexing in my Sqls till date, that's why want to avoid them. I will implement your query tomorrow and then get back to you if i face any issue. – Satya Prakash Panigrahi Sep 04 '12 at 15:20
  • 1
    Hei JvdBerg, it's not working. I tried adding "order by index_field" in my existing query but it didn't make any difference. :( – Satya Prakash Panigrahi Sep 05 '12 at 09:02
  • Can you try this, select * from myTable where index_field>95000 LIMIT 100 – Sathesh Feb 03 '14 at 00:09
  • @SatyaPrakashPanigrahi If you are concerned with query performance with 95000 rows it is essential that you use indexes and not avoid them. You must start to use them as they are the probably the single most important influence on your database performance. Compare n with log n in this instance 100000 to 5! (exclamation mark not factorial!) – The Lonely Coder Sep 23 '14 at 15:17
  • @JvdBerg You said "`select * from myTable order by index_field LIMIT 100 OFFSET 95000` SQLite will then process the index instead of the table", but that is not true. You can run it yourself with `EXPLAIN ANALYSE` to see it does a full table scan (or just look at how long the query takes!). You need to replace the `OFFSET` with a `WHERE` clause on the indexed column, as user318750's answer says. – Arthur Tacca Feb 03 '20 at 13:17
3
select * from data where rowid = (select rowid from data limit 1 offset 999999);
Srdjan
  • 108
  • 7
  • I'm amazed that only good answer for my problem was from 8 years ago. That's great answer if your only id column is a text based colum. Many thanks man. – Morph21 Jan 26 '21 at 19:49
  • worth noting that you can order by an indexed column in the subquery here and it's still performant! – Beau Apr 26 '22 at 19:32
1

With SQLite, you don't need to get all rows returned at once in a big fat array, you can get called back for every row. This way, you can process the results as they come in, which should address both your crashing and performance issues.

I guess you're not using C as you would already be using a callback, but this technique should be available in any other language.

Javascript example (from : https://www.npmjs.com/package/sqlite3 )

 db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
      console.log(row.id + ": " + row.info);
  });