0

My script is running too slow when searching zipcodes within 30 miles of 90210. It was duplicating some even though there are no dups in the zipcode database too, thus the DISTINCT u.id. Besides selecting only the columns I need, any ideas on how to speed this thing up?

<?php
$zipcode = queryDB("SELECT * FROM zipcodes WHERE zipcode='$location' LIMIT 1", 'a');

$distance = 30;
$sql = "SELECT DISTINCT zipcode, ( 3959 * acos( cos( radians(".$zipcode['lat'].") ) * cos( radians( lat ) ) * cos( radians( lon ) - radians(".$zipcode['lon'].") ) + sin( radians(".$zipcode['lat'].") ) * sin( radians( lat ) ) ) ) AS distance FROM zipcodes HAVING distance <= $distance ORDER BY distance";
$zipcodes = queryDB($sql, 'r');

$sql = '';
while($row = mysqli_fetch_assoc($zipcodes)){
    $sql .= "u.category='$category' AND u.zipcode='".$row['zipcode']."' AND u.zipcode = l.zipcode";
    if($_GET['photos'] == 'y'){ $sql .= " AND photo_1!=''"; }
    $sql .= " OR ";
}
$sql = rtrim($sql, ' OR ');

$profiles = queryDB("SELECT DISTINCT u.id, u.*, l.city, l.state, l.zipcode FROM user_accounts as u, zipcodes as l WHERE $sql ORDER BY date_added DESC", 'r');
while($row = mysqli_fetch_assoc($profiles)){
//this is where i display the profile information
} ?>

UPDATE: The biggest issue was not using a join on my user_accounts select. Here's my updated sql query.

    $sql = '';
$zipcodes = new ZipCodesRange($location, $distance);
foreach ($zipcodes->zipCodes as $zipcode){
    $sql .= "(u.category='$category' AND u.zipcode='".$zipcode['ZIPCODE']."'";
    if($_GET['photos'] == 'y'){ $sql .= " AND photo_1!=''"; }
    $sql .= ") OR ";
}
$sql = rtrim($sql, ') OR ');
$sql .= ")";


$profiles = queryDB("SELECT u.id, u.photo_1, u.name, u.business, u.date_added, l.city, l.state, l.zipcode FROM user_accounts as u INNER JOIN zipcodes as l ON u.zipcode = l.zipcode WHERE $sql ORDER BY date_added DESC", 'r');
HTMLGuy
  • 245
  • 2
  • 17
  • Which RMDBS are you using? – Christian Phillips Sep 10 '13 at 18:14
  • shane i dont think this is possible without DB .. i think you should have zipcode db and then only it might be possible to do this – Dhaval Sep 10 '13 at 18:16
  • Do you have indexes on `u.category` & `u.zipcode`? – Christian Phillips Sep 10 '13 at 18:16
  • First you need to join `u` and `l` tables, preferably using ansi syntax: `u inner join l on u.zipcode = l.zipcode`. Then append all `or` criteria and put them in parentheses: `and ((u.category = ? and u.zipcode = ?) or (u.category = ?2 and u.zipcode = ?2))`. [This is because `and` has higher precedence over `or`.](http://stackoverflow.com/questions/1241142/sql-logic-operator-precedence-and-and-or) – Nikola Markovinović Sep 10 '13 at 18:20
  • MYSQL - Zipcode DB is 42k rows for each US zipcode. Zipcode is 5 digit, then there are lat, long, city, state abbrev. Category and Zipcode are both indexed. – HTMLGuy Sep 10 '13 at 18:20
  • Is it the first SQL (selecting the ZIP codes) that is slow? Or, is it the second one (Selecting the cities)? – Darius X. Sep 10 '13 at 18:22
  • I believe it's the second one with the user_accounts and cities. – HTMLGuy Sep 10 '13 at 18:23
  • If it is the second, the missing join (see Nikola's comment) could be the culprit. If it is the first one, the answer from David would work (pre-determine latitude and longitude bounds, so you do not have to examine each one). – Darius X. Sep 10 '13 at 18:27
  • Thank you Darius. I'm trying to understand how the calculation works for my situation, but I'll def. try both and see how much faster I can make this thing (then reuse my code for years to come because I'm in way over my head right now). – HTMLGuy Sep 10 '13 at 18:32
  • You need both. First you need to fix the query so it does not return meaningless combinations; then you need to apply Shane Stebner's answer to speed the query up. – Nikola Markovinović Sep 10 '13 at 18:36
  • lol my own answer huh – HTMLGuy Sep 10 '13 at 18:38
  • :-) Silly me. David's answer. – Nikola Markovinović Sep 10 '13 at 18:44
  • Ok, so I sped up the zipcode search time, but it's still extremely slow. It didn't really seem to even make a difference. It must be the user_account select that's so slow...I'm going to setup the join now. – HTMLGuy Sep 10 '13 at 21:07
  • You might remove `distinct` now because query should not return duplicates anymore. – Nikola Markovinović Sep 10 '13 at 22:35
  • I did that now too and I'm not sure if that speeds it up, but I would assume so. Thanks again. – HTMLGuy Sep 10 '13 at 22:50

1 Answers1

3

Per @malias from this other question explaining the haversine formula, here is a document showing different ways to improve the speed of this sql: http://tr.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL

The main goal is to limit the zipcodes being searched so that you don't have to calculate the haversine of so many records. The suggestion was to add a WHERE clause to filter the bounds of the latitude and longitude. Table indexes can be used for this rectangular filter, then the haversine can be used to filter it as a sphere.

Community
  • 1
  • 1
David
  • 34,223
  • 3
  • 62
  • 80
  • If you are able to provide an example for my situation, I would be very appreciative, however, I understand if you don't want to take the time. After all, it is my job to figure this out lol. – HTMLGuy Sep 10 '13 at 18:35
  • In theory, your answer makes a lot of sense. I just need to figure out how to build it. I feel confident enough to accept your answer. – HTMLGuy Sep 10 '13 at 18:41
  • For my situation, your answer didn't make much of a difference, however, the table join did and now it's perfect. – HTMLGuy Sep 10 '13 at 21:21