1

I have a query that I use to find results that are ordered by location. Results also have to account for VAT so this is also in the query. The query can unfortunately take 4+ seconds to run when not cached. Can anyone spot any glaringly obvious issues or suggest anything I can do to improve it?

Just to clarify what is happening in the query:

  • The distance is calculation is euclidean distance using lat/long
  • The incvat fields are used to show the price when vat is included
  • The WHEN / THEN statement is used to put prices of 0 at the very bottom

The query:

SELECT * , ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) AS distance, 
    hire_car_day + ( hire_car_day * 0.2 * ! hire_car_incvat ) AS hire_car_day_incvat, 
    hire_car_addday + ( hire_car_addday * 0.2 * ! hire_car_incvat ) AS hire_car_addday_incvat, 
    hire_car_week + ( hire_car_week * 0.2 * ! hire_car_incvat ) AS hire_car_week_incvat, 
    hire_car_weekend + ( hire_car_weekend * 0.2 * ! hire_car_incvat ) AS hire_car_weekend_incvat
FROM hire_car
LEFT JOIN company_branch ON company_branch_id = hire_car_branchid
LEFT JOIN hire_cartypelink ON hire_cartypelink_carhireid = hire_car_id
LEFT JOIN users ON company_branch_userid = user_id
WHERE 1 
GROUP BY hire_car_id
HAVING distance <=30
ORDER BY CASE hire_car_day_incvat
WHEN 0 
THEN 40000 
ELSE hire_car_day_incvat
END , distance ASC 
LIMIT 0 , 30
Chris
  • 26,744
  • 48
  • 193
  • 345
  • probably query is making complete table scan to calculate distance for all rows to use filter `having distance <=30`. If you have too much rows this might took time. Check your query with adding EXPLAIN before select and you will see bottlenecks. – Nesim Razon Nov 27 '12 at 22:19
  • 2
    Don't use `SQRT()`, square the initial distance and use that instead (`HAVING sqDistance < 900`). The distance calculation isn't completely correct on a spherical surface (you can find the formula on SO, but it's more expensive). Add box-bounds checking to eliminate most rows, and use indices (eg `WHERE company_branch_lat > 52 AND company_branch_lat < 53`). The optimizer probably doesn't like `ORDER BY CASE...`, because it _may_ mess with a simple index-scan; if you're worried about '0' prices, put the `CASE` in the `SELECT` (or use two columns). What's up with the `0.2`? – Clockwork-Muse Nov 27 '12 at 23:36
  • @Clockwork-Muse Thanks for the idea about dropping the "root" of the SQRT. Any ideas how I can improve the `ORDER BY CASE` whilst still maintaining the functionality I need? 0.2 is 20% tax in the UK – Chris Nov 28 '12 at 09:16
  • What's the type of the `0.2` - I suspect it's going to default to `REAL` or `FLOAT`, at which point your answers may be subtly wrong. What does `! hire_car_incvat` result in? _Why_ do you have to put '0' prices on the 'bottom' (and what does `hire_car_day_incvat` look like anyways)? Also, that `GROUP BY` is subject to mySQL's special rules on the subject - I don't know enough about your schema to tell if you're getting incorrect results. – Clockwork-Muse Nov 28 '12 at 21:08

2 Answers2

2

You can use the mysql spatial extension and save the latitude and longitude as a point datatype and make it a spatial index. That way you can reorder the coordinates along a curve and reduce the dimension and preserve spatial information. You can use the spatial index as a bounding box to filter the query and then use the harvesine formula to pick the optimal result. Your bounding box should be bigger then the radius of the great circle. Mysql uses a rtree with some spatial index and my example was about a z curve or a hilbert curve: https://softwareengineering.stackexchange.com/questions/113256/what-is-the-difference-between-btree-and-rtree-indexing. Then you can insert a geocoordinate directly into a point column: http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-values.html. Or you can use a geometry datatype: http://markmaunder.com/2009/10/10/mysql-gis-extensions-quick-start/. Then you can use MBRcontains function like so: http://dev.mysql.com/doc/refman/4.1/en/relations-on-geometry-mbr.html or any other functions: http://dev.mysql.com/doc/refman/5.5/en/functions-for-testing-spatial-relations-between-geometric-objects.html. Hence you need a bounding box. Here are some examples:

Here is a simple example with point datatype:

    CREATE SPATIAL INDEX sx_place_location ON place (location)

    SELECT  * FROM    mytable
    WHERE   MBRContains
           (
           LineString
                   (
                   Point($x - $radius, $y - $radius),
                   Point($x + $radius, $y + $radius)
                   )
           location
           )
    AND Distance(Point($x, $y), location) <= $radius

I'm not sure if it works because it's uses a radius variable with a bounding-box function. It's seems to me MBRwithin is a bit simpler, because it doesn't need any argument: Mysql: Optimizing finding super node in nested set tree.

Community
  • 1
  • 1
Micromega
  • 12,486
  • 7
  • 35
  • 72
0

You are using GROUP BY statement together with HAVING, although I don't see anywhere in the query any aggregate functions. I recommend you to re-write the query like this and see if it makes any difference

SELECT * , ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) AS distance, 
hire_car_day + ( hire_car_day * 0.2 * ! hire_car_incvat ) AS hire_car_day_incvat, 
hire_car_addday + ( hire_car_addday * 0.2 * ! hire_car_incvat ) AS hire_car_addday_incvat, 
hire_car_week + ( hire_car_week * 0.2 * ! hire_car_incvat ) AS hire_car_week_incvat, 
hire_car_weekend + ( hire_car_weekend * 0.2 * ! hire_car_incvat ) AS hire_car_weekend_incvat
FROM hire_car
LEFT JOIN company_branch ON company_branch_id = hire_car_branchid
LEFT JOIN hire_cartypelink ON hire_cartypelink_carhireid = hire_car_id
LEFT JOIN users ON company_branch_userid = user_id
WHERE ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) <= 30
ORDER BY CASE hire_car_day_incvat
WHEN 0 
THEN 40000 
ELSE hire_car_day_incvat
END , distance ASC 
LIMIT 0 , 30
cha
  • 10,301
  • 1
  • 18
  • 26