1

I have a MySQL SELECT query that calculates a distance using Pythagoras in the WHERE clause to restrict results to a certain radius.

I also use the exact same calculation in the ORDER BY clause to sort the results by smallest distance first.

Does MySQL calculate the distance twice (once for the WHERE, and again for the ORDER BY)?

If it does, how can I optimize the query so it is only calculated once (if possible at all)?

ljbade
  • 4,576
  • 4
  • 30
  • 35

2 Answers2

3

Does MySQL calculate the distance twice (once for the WHERE, and again for the ORDER BY)?

No, the calculation will not be performed twice if it is written in exactly the same way. However if your aim is to improve the performance of your application then you might want to look at the bigger picture rather than concentrating on this minor detail which could give you at most a factor of two difference. A more serious problem is that your query prevents efficient usage of indexes and will result in a full scan.

I would recommend that you change your database so that you use the geometry type and create a spatial index on your data. Then you can use MBRWithin to quickly find the points that lie inside the bounding box of your circle. Once you have found those points you can run your more expensive distance test on those points only. This approach will be significantly faster if your table is large and a typical search returns only a small fraction of the rows.

If you can't change the data model then you can still improve the performance by using a bounding box check first, for example WHERE x BETWEEN 10 AND 20 AND y BETWEEN 50 AND 60. The bounding box check will be able to use an index, but because R-Tree indexes are only supported on the geometry type you will have to use the standard B-Tree index which is not as efficient for this type of query (but still much better than what you are currently doing).

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • @mark-byers, while I agree in principle, the spatial features of MySQL are somewhat limited and I don't think will help in this case. One should read the docs carefully, or better yet, do some benchmarking with actual data. – Joshua Martell Nov 14 '10 at 22:33
  • Thanks for the suggestion. I am avoiding the spatial stuff currently and may be forced to use it later but I will likely switch to PostgreSQL if I do as it has better support for spatial stuff. – ljbade Nov 14 '10 at 22:52
  • Also I have already implemeted the bounding box optimization and can confirm that it provided a significant improvement. I was just worried that when we get a lot of points inside the radius that the ORDER BY would calculate the distance twice. – ljbade Nov 15 '10 at 01:49
1

You could possibly select for it, put it in the HAVING clause and use it in the ORDER BY clause, then the calculation is certainly only done once, but I guess that would be slower, because it has to work with more data. The calculation itself is not that expensive.

AndreKR
  • 32,613
  • 18
  • 106
  • 168
  • Using HAVING without an aggregate function is usually an error. I can't think of any case where it wouldn't be an error, actually. – Vincent Savard Nov 14 '10 at 22:27
  • I'll second the calculation not being expensive. You're much more likely to IO bound than CPU bound these days. I'd opt for the WHERE over the ORDER BY + LIMIT. You won't have to sort that way. – Joshua Martell Nov 14 '10 at 22:27