suppose I've a table of locations(latitude, longitude) and I want to get all the location within a specified range to my current location, my question which is better: to include the calculations in the where clause of the query, something like, select * from locations where /* formula of cosines and sines of the latitude and longitude */ < distance
.
my second choice is to get all the locations and do the calculations locally in the program. I'm using java and mysql if it matters

- 3,427
- 3
- 23
- 36
-
i would prefer the where clause 'cause queries and databases are high optimized and the column could be indexed. – Zelldon Sep 17 '15 at 18:09
-
2Regardless of the performance difference between doing the calculation in the database query or in code: for searching e.g. the nearest 10 locations, the db will search them by using the above query and return those ten entries. For doing the calculation in code, you might be forced to fetch all existing datasets to calculate their distance - otherwise you can not be sure, which one is the nearest. – dhh Sep 17 '15 at 18:09
-
2Can you use PostgreSQL and PostGIS instead? Performance will be far, _far_ better. – Matt Ball Sep 17 '15 at 18:14
-
@dhh doesn't the database server will check all the locations and then return those ten? – monim Sep 17 '15 at 18:14
-
[related](http://stackoverflow.com/questions/159255/what-is-the-ideal-data-type-to-use-when-storing-latitude-longitudes-in-a-mysql); references [mysql spatial extensions](http://dev.mysql.com/doc/refman/5.6/en/spatial-extensions.html); there are functions for calculating distance, so you won't have to use sines/cosines. – Kenney Sep 17 '15 at 18:15
-
Database; thought I would suggest one with spatial capabilities. – Gavin Sep 17 '15 at 18:17
-
1Of course, the database server will only return ten if you do those calculations in your query. If you calculate the distance in code, you must fetch all locations which will result in a really large amount of data depending on the size of your data source. – dhh Sep 17 '15 at 18:17
-
@Zelldon in most cases, performing such calculations in the WHERE negates the benefits of indexes; just calling DATE() on a timestamp field necessitates a full table scan. – Uueerdo Sep 17 '15 at 18:23
3 Answers
Depending on the data and complexity of the calculation, you might try performing a mix of the two. Use a simpler, less precise, but more expedient calculation (capable of taking advantage of indices) to reduce the results sent back for final processing.
For example, if the criteria is "5 miles away", instead of performing the equation involving geometric functions; you could calculate a bounding box. Retrieve the rows in that range, and then use the more complex function to exclude the "corners".
Alternately, if your server is powerful and expected clients relatively weak computationally speaking; it might be better to off load those calculations to the server anyway.
Edit: Another possibility is to put the more complex calculation in a HAVING clause, keeping the simpler one in the WHERE, that way it will only need performed on the ones that pass the simpler, more index friendly, filter.
Edit2: To give a generic example (since I am unacquainted with using long and lat; your equations probably need to account for the globe "wrap").
SELECT *
FROM theTable
WHERE x BETWEEN [minX] AND [maxX]
AND y BETWEEN [minY] AND [maxY]
HAVING POW(x-[originX], 2) + POW(y-[originY], 2) <= POW([distance], 2)
;

- 15,723
- 1
- 16
- 21
-
Yes absolutely also someone great once said that: operation which should be done at DB end should and must be performed at DB end rather doing it at App end and increasing complexity. Nowadays, all DBMS are powerful enough to do any sort of calculation but YES based on complexity ans ease it's sometimes better to offload that to client code. – Rahul Sep 17 '15 at 18:34
-
another thing regardless the design and complexity, I know some programming languages are faster than others, doesn't that apply between programming languages and RDBMSs? i.e. which is faster java or mysql in doing the calculations ? – monim Sep 20 '15 at 07:12
-
Generally speaking, a programming language should always be faster than a RDBMS. I say "generally" and "should" because RDBMS are designed and optimized for certain categories of tasks, by developers specializing in those kinds of algorithms; those should run better in a program, but that is highly dependent on the skill of the programmer and the underlying libraries used. – Uueerdo Sep 21 '15 at 16:06
-
Another matter that could affect the relative performance is that a RDBMS server may already have the data in memory, ready to process, possible even have the answers cached; but a program would need to load the data from disk (or across a network.... from perhaps an RDBMS). – Uueerdo Sep 21 '15 at 16:12
-
@Uueerdo, I'd actually argue the other way around. If you need to do row-by-row processing with a lot of conditional logic per row, then a programming language will probably be faster, but if you're doing general analysis, searching, etc. it will almost always be faster in the DB. – zelarian Mar 31 '22 at 13:42
For small datasets, it does not matter how you perform the query.
For large datasets, either way is problematic...
If you fetch all the data, then do the work in the client, that's a lot of network traffic. Plus lots of calculating.
If you check the distance in SQL, that's a lot of calculating there, and the entire dataset is scanned, unless...
If you limit the rows to check to a "square" around the target point, you can sort of cut down on the calculations. It would require INDEX(lat)
or INDEX(lng)
. Alas, INDEX(lat, lng)
does not help.
I have tackled this problem and come up with a complex answer that involves a Stored Procedure, PARTITIONing
, and iterating. But it (mostly) limits itself to the "square", hence is very efficient for large datasets. The explanation and code are here.

- 135,179
- 13
- 127
- 222
You could use a stored procedure to minimize traffic of the same query to the server. You can even do the calculation on the client side browser if you want, and avoid any calculation on your systems.

- 6,929
- 1
- 17
- 47