4

I have a query in MySQL which runs a stored function on each row of a table and then orders the rows by the result of the function before returning the first 10 rows.

SELECT rowId, MyFunction(x, y, constX, constY) AS funResult
FROM myTable
ORDER BY funResult DESC
LIMIT 10

The problem is that it takes several seconds to run on a table with 10,000 rows which is much too slow. The result of the function can't be computed and stored as another row in the table because it takes a constant which is given by PHP and is different each time the query is run.

The speed of the function itself is not the problem, since removing ORDER BY funResult DESC LIMIT 10 means that the query runs in less than 0.01 seconds.

The problem must be sorting the rows - is there any way this can be done faster, considering the fact that only the first 10 rows are needed?

Update

The simplified function being used calculates the distance between each row and a specified point (where LAT_B and LON_B are constants dependent on the query):

CREATE FUNCTION MyFunction(LAT_A float, LON_A float, LAT_B float, LON_B float)
RETURNS double
DETERMINISTIC
BEGIN

DECLARE tempCalc DOUBLE;
SET tempCalc = 3956 * 2 * ASIN(SQRT( POWER(SIN((LAT_A -abs( LAT_B)) * pi()/180 / 2),2)    
    + COS(LAT_A * pi()/180 ) * COS( abs(LAT_B) *  pi()/180)
    * POWER(SIN((LON_A - LON_B)
    * pi()/180 / 2), 2) ));

RETURN tempCalc;

END
Menelaos
  • 23,508
  • 18
  • 90
  • 155
user882807
  • 373
  • 1
  • 4
  • 17
  • 2
    Please show your function definition. – eggyal Apr 29 '13 at 10:51
  • Yes, we should also see your function definition. – Menelaos Apr 29 '13 at 10:54
  • Try to move your order by and limit directly within your function as opposed to doing them later. Your function can return the 10 results directly sorted and ready. – Menelaos Apr 29 '13 at 11:14
  • Are col1 and col2 columns of myTable? in the return I suspect the multiplication takes precedence over the addition so you want to add col2*3 to tempCalc? – Menelaos Apr 29 '13 at 12:47
  • Yes, col1 and col2 are columns of myTable, and I want the multiplication to be performed first. – user882807 Apr 29 '13 at 12:51
  • @user882807 Interesting... seems http://stackoverflow.com/questions/15296321/mysql-performance-when-ordering-on-calculated-column is also similar (almost identical) problem with long and lat and ordering by calculated distance. – Menelaos Apr 29 '13 at 13:05
  • 1
    @user882807 I think your most efficient option would be to do this: http://stackoverflow.com/questions/1006654/fastest-way-to-find-distance-between-two-lat-long-points – Menelaos Apr 29 '13 at 13:18
  • Related: http://stackoverflow.com/questions/12494146/mysql-geo-search-having-distance-performance – Menelaos Apr 29 '13 at 13:36
  • ¿Why don't use a simple Pitagoras? – Justo Jul 13 '17 at 02:55

5 Answers5

3

Options:

  1. Incorporate sorting within your stored procedure definition/logic. If your calling SQL select within your stored procedure perform the sort and limit there. - This means you will not be producing 10,000 rows in the stored procedure, just to resort them. Also, if table has indexes original sort within SQL select may be much faster.

  2. Verify that indexing is used within your table. - Indexes will cause your sorts to be performed quicker when selecting on the table.

Please provide us with the function definition, it would be easier to additionally help you.

Finally, try to move your order by and limit directly within your function as opposed to doing them later. Your function can return the 10 results directly sorted and ready. If you want, make two functions - one that returns the full results and one that returns them limited and sorted.

Update:

After seeing your function it becomes apparent that your trying to order by a calculated value. Ordering by calculated values is extremely slow as also mentioned in:

I am trying to think how you could "pre-process/order" your data based on col1 or col2 in order to speed up the ultimate ordering of your results. If col1 and col2 are columns of the table, and funResult is a mathematical function that can be graphed one of the two has a higher effect on the function return value....

Finally, if col1 and col2 are columns of myTable, you do not need to use a stored function but could query with, but this wouldn't make a large difference...Your main problem is ordering by a calculated function:

SELECT rowId, ((col1-INPUT_CONST)*2)+(col2*3) AS funResult
FROM myTable
ORDER BY funResult DESC
LIMIT 10

Update 2:

After digging for the problem of sorting be calculated distance I found that this has been asked and solved in a very efficiently at the link below. In relation to sorting by a calculated value, as your sorting by a calculated value it is inherently slow. See the following two links for additional help:

Finally, the closest that comes to your answer is this: https://stackoverflow.com/a/4180065/1688441

Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • Thanks - I have added the function definition but I don't know how to incorporate sorting and limiting into the definition itself – user882807 Apr 29 '13 at 12:25
  • The [XY problem](http://meta.stackexchange.com/a/66378) strikes again. Sigh. +1 for perseverance; I suspect the spatial index approach [detailed by @Quassnoi](http://stackoverflow.com/a/1006668) in one of the questions to which you link may be the best way forward here. – eggyal Apr 29 '13 at 14:30
  • @eggyal Completely... this happens so many times when User asks Y, leaves out tons of details...and in the end you realize he's really asking Z which may be a different variation of already asked/answered question A. – Menelaos Apr 29 '13 at 14:49
2

Expanding your function:

MyFunction(col1, col2, constant) = (col1 - constant) * 2.0 + col2 * 3.0
                                 = 2*col1 + 3*col2 - 2*constant

Therefore ordering by MyFunction(col1, col2, constant) is equivalent to ordering by 2*col1 + 3*col2 irrespective of the constant supplied. Therefore, you can cache the result of that calculation in a new, indexed column:

ALTER TABLE myTable
  ADD COLUMN tmpResult FLOAT,
  ADD INDEX (tmpResult);

CREATE TRIGGER ins BEFORE INSERT ON myTable FOR EACH ROW
  SET NEW.tmpResult := 2*NEW.col1 + 3*NEW.col2;

CREATE TRIGGER upd BEFORE UPDATE ON myTable FOR EACH ROW
  SET NEW.tmpResult := 2*NEW.col1 + 3*NEW.col2;

UPDATE myTable SET tmpResult = 2*col1 + 3*col2;

Then your SELECT becomes:

SELECT   rowId, tmpResult - 2*constant AS funResult
FROM     myTable
ORDER BY tmpResult DESC
LIMIT    10
eggyal
  • 122,705
  • 18
  • 212
  • 237
1

I'd guess that your problem is the time it takes your function to execute. If you execute this query:

SELECT rowId, MyFunction(col1, col2, constant) AS funResult
FROM myTable
LIMIT 10

the database has to:

  • compute the function result for 10 rows
  • return these 10 rows

In contrast, if you execute this query:

   SELECT rowId, MyFunction(col1, col2, constant) AS funResult
   FROM myTable
   ORDER BY funResult DESC
   LIMIT 10

the database has to

  • compute the function result for all 10000 rows in the table
  • sort 10000 rows
  • return the first 10 rows

So, to actually know whether your function is the bottleneck or not, you should ensure that you actually compute the function result for all 10000 rows for both queries and check whether the difference persists.

Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
1

It's actually considerably faster in mysql to do this

select * from database order by 3956 * 2 * ASIN(SQRT( POWER(SIN((LAT_A -abs( LAT_B)) * pi()/180 / 2),2) + COS(LAT_A * pi()/180 ) * COS( abs(LAT_B) * pi()/180) * POWER(SIN((LON_A - LON_B) * pi()/180 / 2), 2) ));

than it is to order by a custom function.

It's ugly but alot faster.

Try doing an explain on it. For some reason mysql uses a temporary table when there is a function involved but not when there is just math.

0

try this

  SELECT rowId, MyFunction(col1, col2, constant) AS funResult
  FROM myTable
  ORDER BY MyFunction(col1, col2, constant)  DESC
  LIMIT 10
echo_Me
  • 37,078
  • 5
  • 58
  • 78