4

I've got a table with int field, let's call it createTime. The table consists of few million records. Now I want to run the query:

select * from `table` order by `createTime` desc limit 500000, 10

I've created an index for createTime, but the query runs INCREDIBLY slow. What's the reason? How can I improve it?

Here's what EXPLAIN says:

id 1
select_type simple
table table
type index
possible_keys null
key createTime
key_len 4
ref null
rows 500010
extra

As for the offset, it's working much faster when it's small.

Sebastian Nowak
  • 5,607
  • 8
  • 67
  • 107
  • Have you tried using the EXPLAIN keyword to see what MySql is doing? Effectively, EXPLAIN select * from `table` order by `createTime` desc limit 500000, 10. Out of interest, what data type is createTime? – dash Dec 11 '11 at 20:30
  • I've explained in the question it's an int. EXPLAIN says nothing interesting, it's using the created index and it estimates the number of rows to 500010. – Sebastian Nowak Dec 11 '11 at 20:38
  • (What was the performance like *before* the index was created? What about a limit with no offset? A limit with a smaller offset?) –  Dec 11 '11 at 20:42
  • @SebastianNowak: Why not _show_ us what `EXPLAIN` says? Since you're asking for alternative interpretations of the data... – Lightness Races in Orbit Dec 11 '11 at 20:45
  • Sorry, I meant what does a typical createTime column contain - a few example values. Also, are you using ISAM, InnoDB? – dash Dec 11 '11 at 20:48
  • possible duplicate of [Why does MYSQL higher LIMIT offset slow the query down?](http://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down) – Lightness Races in Orbit Dec 11 '11 at 20:48
  • Does it make a difference if you write `select "createTime" from "table" order by "createTime" desc limit 500000, 10`? (Sorry, I don't know how to embed backticks in comments.) – Neil Dec 11 '11 at 20:50
  • It's MyISAM, and createTime contains the result of time() function in PHP - so time in seconds since 1970. I've updated the question to provide more details. – Sebastian Nowak Dec 11 '11 at 20:51
  • @Neil: `With a \`Backslash\`?` Same as in non-comment posts. – Lightness Races in Orbit Dec 11 '11 at 20:51
  • @Neil: It does, now the query takes few seconds to run – Sebastian Nowak Dec 11 '11 at 20:53
  • @TomalakGeret'kal Ah, I read the markdown help but didn't see that, thanks! – Neil Dec 11 '11 at 21:06
  • check out my answer here http://stackoverflow.com/questions/4942742/large-php-arrays-pagination/4943129#4943129 – Jon Black Dec 12 '11 at 20:48

3 Answers3

9

General rule: avoid OFFSET for large tables.

[A]s the offset increases, the time taken for the query to execute progressively increases, which can mean processing very large tables will take an extremely long time. The reason is because offset works on the physical position of rows in the table which is not indexed. So to find a row at offset x, the database engine must iterate through all the rows from 0 to x.

The general rule of thumb is “never use offset in a limit clause”. For small tables you probably won’t notice any difference, but with tables with over a million rows you’re going to see huge performance increases.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
2

You can speed this up if you have a unique column. Ideally it would be createTime itself:

SELECT "table".*
  FROM "table"
  INNER JOIN (
    SELECT "createTime"
      FROM "table"
      ORDER BY "createTime" DESC
      LIMIT 500000, 10
  ) AS "limit" ON "table"."createTime" = "limit"."createTime"

If createTime is not unique, but you have another column that is unique, then you may find you need to create a composite index on createTime and your other column in order for this query to run efficiently:

SELECT "table".*
  FROM "table"
  INNER JOIN (
    SELECT "createTime", "unique"
      FROM "table"
      ORDER BY "createTime" DESC
      LIMIT 500000, 10
  ) AS "limit" ON "table"."unique" = "limit"."unique"
Neil
  • 54,642
  • 8
  • 60
  • 72
  • 1
    Thanks, but this query still takes few seconds to run. It's unacceptable as for a website. Isn't there any faster way to implement pagination basing on createTime column? – Sebastian Nowak Dec 11 '11 at 22:36
0

I think indexing won't change anything. Using offset, limit means "read offset + limit datasets and discard (num of offset) of them". If you really want have a pagination or something like that for such a large table, you should use a method where you can limit your results in the WHERE part of your query. Those type of query will benefit of a proper index.

Using datetimes, a solution might be to use timeslots to display your data. E.g. you can display links for each day of a week and build your query like "WHERE createDate > '2011-12-11' AND crateDate < '2011-12-12'.

hacksteak25
  • 2,009
  • 1
  • 16
  • 23