Optimizing MySQL ORDER BY on a calculation shared with WHERE

Question

I have a MySQL SELECT query that calculates a distance using Pythagoras in the WHERE clause to restrict results to a certain radius.

I also use the exact same calculation in the ORDER BY clause to sort the results by smallest distance first.

Does MySQL calculate the distance twice (once for the WHERE, and again for the ORDER BY)?

If it does, how can I optimize the query so it is only calculated once (if possible at all)?

could you show an example of the query? it would help a lot – Ozzy Nov 14 '10 at 22:19 — Ozzy, Nov 14 '10 at 22:19

Mark Byers · Accepted Answer · 2010-11-14T23:36:40.530

Does MySQL calculate the distance twice (once for the WHERE, and again for the ORDER BY)?

No, the calculation will not be performed twice if it is written in exactly the same way. However if your aim is to improve the performance of your application then you might want to look at the bigger picture rather than concentrating on this minor detail which could give you at most a factor of two difference. A more serious problem is that your query prevents efficient usage of indexes and will result in a full scan.

I would recommend that you change your database so that you use the geometry type and create a spatial index on your data. Then you can use MBRWithin to quickly find the points that lie inside the bounding box of your circle. Once you have found those points you can run your more expensive distance test on those points only. This approach will be significantly faster if your table is large and a typical search returns only a small fraction of the rows.

If you can't change the data model then you can still improve the performance by using a bounding box check first, for example WHERE x BETWEEN 10 AND 20 AND y BETWEEN 50 AND 60. The bounding box check will be able to use an index, but because R-Tree indexes are only supported on the geometry type you will have to use the standard B-Tree index which is not as efficient for this type of query (but still much better than what you are currently doing).

@mark-byers, while I agree in principle, the spatial features of MySQL are somewhat limited and I don't think will help in this case. One should read the docs carefully, or better yet, do some benchmarking with actual data. — Joshua Martell, Nov 14 '10 at 22:33
Thanks for the suggestion. I am avoiding the spatial stuff currently and may be forced to use it later but I will likely switch to PostgreSQL if I do as it has better support for spatial stuff. — ljbade, Nov 14 '10 at 22:52
Also I have already implemeted the bounding box optimization and can confirm that it provided a significant improvement. I was just worried that when we get a lot of points inside the radius that the ORDER BY would calculate the distance twice. — ljbade, Nov 15 '10 at 01:49

score 1 · Answer 2 · answered Nov 14 '10 at 22:25

1

You could possibly select for it, put it in the HAVING clause and use it in the ORDER BY clause, then the calculation is certainly only done once, but I guess that would be slower, because it has to work with more data. The calculation itself is not that expensive.

answered Nov 14 '10 at 22:25

AndreKR

32,613
18
106
168

Using HAVING without an aggregate function is usually an error. I can't think of any case where it wouldn't be an error, actually. – Vincent Savard Nov 14 '10 at 22:27
I'll second the calculation not being expensive. You're much more likely to IO bound than CPU bound these days. I'd opt for the WHERE over the ORDER BY + LIMIT. You won't have to sort that way. – Joshua Martell Nov 14 '10 at 22:27

Optimizing MySQL ORDER BY on a calculation shared with WHERE

2 Answers2

Linked