0

I have a table with currently ~1500 rows which is expected to grow over time (can't say how much, but still), the website is read-only and lets users do complex queries through the use of some forms, then the search query is completely URL-encoded since it's a public database. It's important to know that users can select what column data must be sorted by.

I'm not concerned about putting some indexes and slowing down INSERTs and UPDATEs (just performed occasionally by admins) since it's basically heavy-reading, but I need to paginate results as some popular queries can return 900+ results and that takes up too much space and RAM on client-side (results are further processed to create a quite rich <div> HTML element with an <img> for each result, btw).

  • I'm aware of the use of OFFSET {$m} LIMIT {$n} but would like to avoid it
  • I'm aware of the use of this

Query

SELECT *
FROM table
WHERE {$filters} AND id > {$last_id}
ORDER BY id ASC
LIMIT {$results_per_page}

and that's what I'd like to use, but that requires rows to be sorted only by their ID!

  • I've come up with (what I think is) a very similar query to custom sort results and allow efficient pagination.

Query:

SELECT *
FROM table
WHERE {$filters} AND {$column_id} > {$last_column_id}
ORDER BY {$column} ASC
LIMIT {$results_per_page}

but that unfortunately requires to have a {$last_column_id} value to pass between pages!

I know indexes (especially unique indexes) are basically automatically-updated integer-based columns that "rank" a table by values of a column (be it integer, varchar etc.), but I really don't know how to make MySQL return the needed $last_column_id for that query to work!

The only thing I can come up with is to put an additional "XYZ_id" integer column next to every "XYZ" column users can sort results by, then update values periodically through some scripts, but is it the only way to make it work? Please help.

vaso123
  • 12,347
  • 4
  • 34
  • 64
Alain D'Ettorre
  • 134
  • 1
  • 4
  • 12
  • Instead of sorting by `{$column}`, sort by `{$column}, id`—then all you need do is remember the last seen values of `{$column}` and `id` and specify them both in your `WHERE` criteria: e.g. using (concise) row constructors `WHERE ({$column}, id) > ({$last_value}, {$last_id})` or using (sargable) explicit boolean logic `WHERE ({$column} = {$last_value} AND id > {$last_id}) OR {$column} > {$last_value}`. – eggyal Jun 07 '16 at 15:01
  • With pagination you'd normally pass through the *page number* to the script and then use that to calculate the starting index for the LIMIT - for instance `LIMIT 99, 50` to get the 100-149th records... is that not what you're after? – CD001 Jun 07 '16 at 15:06
  • 1
    Why do you want to avoid offset?. Since you have $results_per_page, and want to paginate, the offset would be ($page * $results_per_page). And you can order by whatever. It would be LIMIT {$page * $results_per_page}, {$results_per_page} – Mario Chueca Jun 07 '16 at 15:07
  • 1
    @MarioChueca: Offset can be very costly—MySQL must seek that number of records into the resultset, only then to discard all those it has passed over. Better instead to take advantage of indexes that can take it directly to the starting point. – eggyal Jun 07 '16 at 15:11
  • @eggyal I may be wrong but if the `ORDER BY` is on an index then the `LIMIT x, y` *should* be pretty fast, shouldn't it? It *shouldn't* need a filesort... – CD001 Jun 07 '16 at 15:17
  • @CD001: It's not about requiring a filesort. After the sorted resultset has been produced (howsoever that occurred, whether using an index or filesort or anything else), MySQL must then seek `x` records into it before it can begin sending results to the client—this can produce very significant delays for large values of `x`. See [Why does MYSQL higher LIMIT offset slow the query down?](http://stackoverflow.com/q/4481388) – eggyal Jun 07 '16 at 15:19
  • @eggyal thanks :) This made for interesting reading (linked from that SO link) : https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ – CD001 Jun 07 '16 at 15:26
  • @eggyal that seems like a good solution but still I'm gonna be reading all values and ids of the returned rows for the first page, get the highest of them (last_value and last_id), then print them on the Back/Next links to pass them between pages. That will lastly result in links like search.php?FILTERS&last_val=blabla&last_id=123 where "blabla" could be as long as a full line while a dedicated "index-like" column would always require just search.php?FILTERS&last_id=123, still a lot but less code for client, more maintainance for me. Don't you agree? – Alain D'Ettorre Jun 07 '16 at 16:11
  • 1
    @MarioChueca That approach is actually easier both for client and for me but I've read it gets much slower quite fast with more data stored into the table, so I'd like to avoid it if possible that's why I asked! – Alain D'Ettorre Jun 07 '16 at 16:13
  • I've tried retrieving the last 50 rows with no WHERE clause applied using a normal OFFSET M LIMIT N method and using a "deferred join" method as described by [link](http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/) and it actually performed in ~32% less time than the first simple OFFSET, pretty good. Then I manually tried something like `WHERE id>last_id` and got a strikingly ~76% less time to execute! Gotta maintain an "index-like" column or just use deferred join after all – Alain D'Ettorre Jun 07 '16 at 16:38
  • I did some testing on this on a pretty hefty database table and using an offset limit like `LIMIT 25000, 100` the initial query executed in something like *0.13s* which is OK-ish and gives you a list of PRIMARY keys. You can then `INNER JOIN` that result onto your data table to retrieve the actual dataset. As long as your initial LIMITED query is only looking at a single index it's pretty fast. – CD001 Jun 08 '16 at 16:02
  • I tried the "deferred join" method on a simple `SELECT * FROM table` with no WHERE or ORDER BY clauses and it actually performed quite better than simple OFFSET. However, when using real-world WHERE and ORDER BY clauses the deferred join method did even worse than simple OFFSET. I wonder if WHERE and ORDER BY clauses must go inside the joined table (as I did) or not. Still, I'm starting to think maybe LIMIT N OFFSET M is the only way to go here! – Alain D'Ettorre Jun 09 '16 at 17:24

1 Answers1

0

(Too many comments to fit into a 'comment'.)

Is the query I/O bound? Or CPU bound? It seems like a mere 1500 rows would lead to being CPU-bound and fast enough.

What engine are you using? How much RAM? What are the settings of key_buffer_size and innodb_buffer_pool_size?

Let's see SHOW CREATE TABLE. If the table is full of big BLOBs or TEXT fields, we need to code the query to avoid fetching those bulky fields only to throw them away because of OFFSET. Hint: Fetch the LIMIT IDs, then reach back into the table to get the bulky columns.

The only way for this to be efficient:

SELECT ...
    WHERE x = ...
    ORDER BY y
    LIMIT 100,20

is to have INDEX(x,y). But, even that, will still have to step over 100 cow paddies.

You have implied that there are many possible WHERE and ORDER BY clauses? That would imply that adding enough indexes to cover all cases is probably impractical?

"Remembering where you left off" is much better than using OFFSET, so try to do that. That avoids the already-discussed problem with OFFSET.

Do not use WHERE (a,b) > (x,y); that construct used not to be optimized well. (Perhaps 5.7 has fixed it, but I don't know.)

My blog on OFFSET discusses your problem. (However, it may or may not help your specific case.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks @RickJames but although I know "remembering where you left" is better there's no way of doing that in my application, since there's only a numerical ID column but that merely reflects the chronological order of the entries, while most queries usually order results primarily by other parameters and just secondarily by id, just to avoid ambiguity. This way, I'd have to pass last value of ordering column and ID to "remember where I left off", meaning additional and mysterious GET parameters since the website is a heavy-read database. – Alain D'Ettorre Jun 18 '16 at 20:42