4

I've quite simple query

SELECT id,url,submitted FROM links where  submit_after<CURDATE() and submitted<4 order by ord limit 1438;

Table links has 10 000 000 rows and indexes on submitted,submit_after,ord. The ord field is changing very often because it's used as random field to get always random rows. The limit for query is also rand(1000,4000). Query takes 10 sec to execute

id  select_type table   type    possible_keys   key key_len ref rows    Extra   
1   SIMPLE  links   index   submitted,submit_after  ord 4   NULL    3849    Using where

Any suggestions how to get better performance?

Dharman
  • 30,962
  • 25
  • 85
  • 135
john93230
  • 243
  • 1
  • 2
  • 9

4 Answers4

3

MySQL only can use ONE index per Query. You can see it in the EXPLAIN output. it will use "ord" index.

So create a index over all Feld in the right order (submit_after, submitted, ord) to speed up the Query

Bernd Buffen
  • 14,525
  • 2
  • 24
  • 39
  • While I agree with the basic idea, I am not sure that the index is going to help that much due to the way that the values overlap. For example every row that matches the submit_after WHERE clause will need checking for submitted, as it is using < rather than = on the checks, so by the time it gets to using ORD the values are all over the place. If submitted has a limited number of values then it might well be better to have an index of submitted THEN submit_after – Kickstart Aug 25 '15 at 13:16
  • Now query takes 5 sec to execte is also svery slow. Explain shows "Using index condition; Using MRR; Using filesort" – john93230 Aug 25 '15 at 13:19
  • @Kickstart can You explain what you exatcly mean index on submitted? – john93230 Aug 25 '15 at 13:23
  • @john93230 - the field called submitted. As Bernd correctly points out MySQL will only use one index on a table in a query, but the index can cover multiple columns. Problem with your query is that it isn't specifically checking any values, rather checking that a value is less than a value. So the index is not going to narrow things down much. But if the value of submitted can be from (say) 0 to 1000, narrowing it down to less than 4 is probably going to limit the number of rows more effectively than checking against the current date first. – Kickstart Aug 25 '15 at 13:31
  • Checked your solution but is also doesn't speed much - still 4 sec to execute – john93230 Aug 25 '15 at 13:33
1

A major problem with this query is that indexes can't help much.

It is checking that any returned row has submitted < 4 . But one it has done that it doesn't narrow things down well for the next indexed field to check (ie submit_after < CURDATE() ).

This is because it is bring back a range of rows, but within that range it is checking for another independent range, so the 2nd part of the index is not that helpful.

Same if it does things the other way round, but probably worse (as there will be a lot more submit_after values each with their own range of values for submitted ).

As such the index covering multiple columns is probably not massively helping to reduce the number of records. Let alone having the 3rd column on the index to order the results.

This is probably why when presented with 3 separate indexes MySQL chose to use the index that allowed it to sort the results, and just checked the rows without an index for the WHERE clause.

I presume this query is for some kind of site link submit checking system, with links being valid to be checked after a certain date if they have not yet been checked. If so it might improve things if one of the values of submitted is specifically for rows that have submit_after < CURDATE() and are ready to be checked. This could then be updated by a separate query, and this SELECT could then just check for that value of submitted . An index covering both submitted and ord would then be useful.

Kickstart
  • 21,403
  • 2
  • 21
  • 33
0

You better use something like

$id = rand(1,10000000); SELECT from TABLE WHERE id = $id

Bernhard
  • 1,852
  • 11
  • 19
0

You didn't say what performance you require. But you have a challenging query to satisfy.

Here's a suggestion for you. Do a so-called deferred join. As it is you're grabbing a lot of rows, ordering them, and then throwing out most of them. Let's order and throw out less stuff.

A subquery gets the id values you need to retrieve, like so.

         SELECT id
           FROM links 
          WHERE submit_after < CURDATE()
            AND submitted < 4
          ORDER BY ord 
          LIMIT 1438

This subquery can be optimized by a compound index, but (as others have pointed out) not perfectly.

Which of your WHERE criteria is more selective? Which cuts down the number of resulting rows more aggressively? If it's the CURDATE criterion, use this index: (submit_after, submitted, ord). If it's the submitted < 4 criterion, use this index. (submitted, submit_after, ord). These are called compound covering indexes.

Now, running that query gives you a list of ID values. All that has to be sorted is id and ord, a lesser workload than sorting the whole mess.

Edit. It's quite true that the number of rows to be sorted is the same in the subquery as in the main query. So, you might ask why this saves time. Simply because there are fewer, and shorter, columns, to sort in the subquery. Sorting has to manipulate a lot of data (EXPLAIN calls it "filesort"). Cutting down the volume of data to be manipulated can be very helpful.

To finish your job you use a JOIN like so:

SELECT a.id, a.url, a.submitted
  FROM links AS a
  JOIN (
         SELECT id
           FROM links 
          WHERE submit_after < CURDATE()
            AND submitted < 4
          ORDER BY ord 
          LIMIT 1438
       ) AS b ON a.id = b.id
 ORDER BY a.ord

This retrieves a relatively much smaller number of rows from your main table once you have selected the ids you want. You end up sorting far less data: the ORDER BY this LIMIT that pattern is wasteful because it discards a lot of already-sorted data.

Notice that you have to re-sort the main query. You may, upon experimenting, find that the main query is already in the right order without the second ORDER BY. But you cannot rely on that behavior of the MySQL server.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Am I missing something here? Your sub query is exactly the same as the original query that is taking too long, except for only returning the id. With the same overhead for the SELECT and also sorting on ord. I can see that there is less data in the set to be sorted, but it is still the same number of rows to be sorted. – Kickstart Aug 26 '15 at 08:25
  • 1
    Please see my edit. This sort of deferred-join approach has actually worked for me, even if it's a little counterintuitive. – O. Jones Aug 26 '15 at 11:13
  • Interesting. Would like to have a play with this on this query (although I suspect the big issue is the WHERE clause). – Kickstart Aug 26 '15 at 11:23