4

i have a large table in a database called offers(over 300.000 rows).

when i execute the below query it takes over 3 secs.

$sql = "SELECT * FROM `offers` WHERE (`start_price` / `price` >= 2) ORDER BY RAND() LIMIT 1"; 

Table offers

`id` int(11) NOT NULL,
`title` text NOT NULL,
`description` text NOT NULL,
`image` text NOT NULL,
`price` float NOT NULL,
`start_price` float NOT NULL,
`brand` text NOT NULL

is there any way to make it faster? i want to select one random row (start_price / price >= 2)

DMande
  • 321
  • 1
  • 7
  • 18
  • Don't order the query but pick a random row from your result with php? – Kevin Aug 24 '15 at 10:37
  • generate random id then pass to query. Takes less time. – Insane Skull Aug 24 '15 at 10:44
  • 1
    Note that storing prices as floats is a little odd. And I think it's faster to say: price * 2 < start_price – Strawberry Aug 24 '15 at 11:19
  • @Strawberry why do you think it's faster to use `price * 2 < start_price`? – axiac Aug 24 '15 at 11:33
  • @axiac Because price * 2 can be compared against an indexed value - but I might be mistaken. – Strawberry Aug 24 '15 at 11:38
  • @Strawberry an index can be used to find a row by comparing the value of a column against a constant expression. Multiple-field indexes work the same, they check one column at a time. The condition here uses `price` and `start_price` of the same record. No matter how you combine them, the value must be computed for every row before having something to search in the index. The solution is to add an extra column that holds the ration between the two, index it and keep it up to date (`price` probably change over time). As explained in [this answer](http://stackoverflow.com/a/32180530/4265352). – axiac Aug 24 '15 at 11:44
  • @axiac I accept your point, but I've just tested on a largish data set, and my solution appears to be consistently 30% faster than the OP's. – Strawberry Aug 24 '15 at 11:56

3 Answers3

4

I think your problem is that your query requires a full table scan for the WHERE clause. The order by does make things worse -- depending on the volume that pass the filter.

You might consider storing this number in the table and adding an index to it:

alter table offers add column start_to_price float;

update offers
    set start_to_price = start_price / price;

create index idx_offers_s2p on offers(start_to_price);

Then, your query might be fast:

SELECT o.*
FROM `offers` o 
WHERE start_to_price >= 2
ORDER BY RAND()
LIMIT 1;

If performance is still a problem, then I would be likely to use a where clause first:

SELECT o.*
FROM `offers` o CROSS JOIN
     (select COUNT(*) as cnt from offers where start_to_price >= 2) oo
WHERE rand() <= 10 / cnt
ORDER BY RAND()
LIMIT 1;

This pulls about 10 rows at random and then chooses one of them.

If these don't work, then there are other solutions that get progressively more complicated.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • with the use of cross join the performance improved, but i would like to know if there is another more complicated solution. thank you! – DMande Aug 24 '15 at 13:21
1

One option to make this faster is to ensure that you leverage indexing:

How does database indexing work?

http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

So in this case ensure that you have an index for start_price together with price and in that exact order.

Another way is to optimise the coalition that is in use for the database and tables, so choose utf8mb4 over utf8 and if sorting/localisation is not being an issue for you and you want to be completely anal then general_ci over unicode_ci:

What's the difference between utf8_general_ci and utf8_unicode_ci

Despite the MyISAM storage engine delivering faster read speeds (http://www.rackspace.com/knowledge_center/article/mysql-engines-myisam-vs-innodb) I have found that there are various tweaks available to the InnoDB storage engine that can speed things up more so than I was able to achieve using MyISAM:

https://dba.stackexchange.com/questions/5666/possible-to-make-mysql-use-more-than-one-core?lq=1

So something like the following would be another option:

[mysqld] // Don't play here unless you have read and understand what is going on
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_buffer_pool_size=2G

Yet another option is to take a look at alternate storage engines: https://www.percona.com/software/mysql-database/percona-server/benchmarks

You could also see the other answers for refactoring of your query :)

Community
  • 1
  • 1
Craig van Tonder
  • 7,497
  • 18
  • 64
  • 109
  • This is a very generic comment. Agreed that indexing a table correctly would speed up - generally, but in this case we also have "ORDER BY RAND()": how would indexing help to improve that, exactly? – Dmitri Sologoubenko Aug 24 '15 at 11:12
  • @DmitriSologoubenko I dropped that comment. After reading the question more carefully I discovered the query is slow because of the `WHERE` clause. An index still doesn't help, but this is another discussion. – axiac Aug 24 '15 at 11:17
  • @DmitriSologoubenko I have suggested indexing first as he quotes 3 seconds on 300k records. Granted i have no idea what hardware this is but it seems very slow and it sounds like there are no indexes at all. Generic as it may be, i feel that it is a worthy solution here. – Craig van Tonder Aug 24 '15 at 11:20
  • @axiac How would an index not help? This is exactly what an index does - see my link to the MySQL documentation and note: "To find the rows matching a WHERE clause quickly." – Craig van Tonder Aug 24 '15 at 11:21
  • @axiac index said fields and in said order, optimise the table and what times do you get? – Craig van Tonder Aug 24 '15 at 11:25
  • @DmitriSologoubenko The comment in the question by Crazy Skull in my mind would be the best route for speeding the RAND() aspect up. MySQL is not designed to deal with this kind of thing efficiently, PHP however might be a different story. Do the Rand in PHP and bind it in as was suggested. – Craig van Tonder Aug 24 '15 at 11:28
  • 1
    @axiac, indexing is not always the right solution, sometimes they may speed up a select query at the expenses of an INSERT or UPDATE (which have to update the index). In the most extreme cases, indexes may actually slow down your application, not to mention waste of resources. In order to decide whether your application would benefit from an index or not, you should consider what are the most frequent uses for this table: are they SELECTs (reads) or any writes? – Dmitri Sologoubenko Aug 24 '15 at 11:28
  • @DmitriSologoubenko Agree wholeheartedly and this is something that I did not cover at all. – Craig van Tonder Aug 24 '15 at 11:29
  • @IndigoIdentity: you are most probably right, and given the statement in the question, that's certainly the right approach, however it would be of much more help to provide a more detailed and custom-fitted answer, for instance mentioning the `start_price` and `price` as fields to index. – Dmitri Sologoubenko Aug 24 '15 at 11:29
  • @DmitriSologoubenko but i did mention that exactly, read my answer nicely and say sorry lol :) – Craig van Tonder Aug 24 '15 at 11:29
  • 1
    @IndigoIdentity The index doesn't help because the `WHERE` clause contains the expression `start_price / price`. The value of the expression cannot be computed in advance, it changes for every row. This means it needs to be calculated for every row and that renders the index useless. If you don't believe me then create the table, put some data in it then put [`EXPLAIN`](http://dev.mysql.com/doc/refman/5.7/en/explain-output.html) in front of the query and check the column `key` in the returned result. It's empty. – axiac Aug 24 '15 at 11:31
  • @IndigoIdentity: my apologies, too many concurrent conversations made me completely ignore that part of your answer. Sorry. – Dmitri Sologoubenko Aug 24 '15 at 11:31
  • @DmitriSologoubenko No worries, it was an interesting chat, thanks and enjoy the day! :) – Craig van Tonder Aug 24 '15 at 11:33
  • Read Gordon's [answer](http://stackoverflow.com/a/32180530/4265352) answer for a partial solution to the problem. – axiac Aug 24 '15 at 11:35
  • @axiac Makes more sense but not understanding why you cannot compute this in advance if you are changing the values in the rows. Why not add a third row with the computed value and then index this? – Craig van Tonder Aug 24 '15 at 11:37
  • @IndigoIdentity Evrika! That's exactly what Gordon suggests in their answer. :-) – axiac Aug 24 '15 at 11:38
  • @axiac Gotcha :) That does sound like the best solution, compute the values and store this in a seperate field but regarding RAND(), do this in PHP if possible, quite sure that it will crunch the numbers faster :) – Craig van Tonder Aug 24 '15 at 11:40
0

There are alternatives. The one I have used is described here:-

http://jan.kneschke.de/projects/mysql/order-by-rand/

Essentially you generate a random number that is between your min and max id, and then join that against your result set (using >=), with a limit of 1. So you get a result set starting from a random point in your full results and then just grab the first record.

Down side is that if you id fields are not equally distributed then it isn't truly random

Quick example code, assuming your offers table has a unique key called id:-

SELECT offers.* 
FROM offers 
INNER JOIN 
(
    SELECT RAND( ) * ( MAX( Id ) - MIN( Id ) ) + MIN( Id ) AS Id
    FROM offers
    WHERE (`start_price` / `price` >= 2)
) AS r2
ON offers.Id >= r2.Id
WHERE (`start_price` / `price` >= 2) 
ORDER BY offers.Id LIMIT 1
Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • Because of the `INNER JOIN` this query performs even worse of the original. It completely scans the table twice. – axiac Aug 24 '15 at 11:37
  • If you can avoid the issue with the calculation (adding an extra column for the calculated value as suggested by Gordon then this solution should perform far more quickly than using ORDER BY RAND(). – Kickstart Aug 24 '15 at 11:46
  • Given there is an index on column `id` (the OP doesn't mention anything about the indexes they have on the table) indeed, your query seems to be faster. – axiac Aug 24 '15 at 11:51