2

I have 3 tables I'm joining together to get what users are in a specific area. A scaled back example of the tables:

USER Table (stores all user information) 
ID | Name
----------
 1   John
 2   Joe
 3   Mike 

GEO (has all geo location info; including latitude and longitude; which im excluding for the example )
ID | CITY 
-------------
 1 | ORLANDO
 2 | MIAMI
 3 | DAYTONA

LOCATIONS (stores each users location; each user has multiple locations)
ID | AREA (id = user.id, geo = geo.id)
--------
 1 | 1
 1 | 2
 1 | 3
 2 | 1
 3 | 1
 3 | 3

I've created a function in php to pull the results for a given LAT / LONG withing a certain radius like so (excluding the whole function as its not relevant):

select USER.ID as USERID, (6371 * acos(cos(radians( {$lat})) * cos(radians(g.latitude)) * cos(radians(g.longitude) - radians({$long})) + sin(radians({$lat})) * sin(radians(g.latitude)))) AS distance
            from 
            GEO G 
            join LOCATIONS LOC on LOC.AREA = G.ID
            join USER U on LOC.ID = USERID
            HAVING distance <= {$radius}

Now the issue. This works and pulls all the info, but results in showing the same user multiple times due to the user being in the LOCATIONS table multiple times (ie shows 100 results, with 15 different users)

So my thought was to GROUP BY USER.id; however this only matches the first location for that user; only resulting in 2 results.

I've tried DISTINCT; but the rows are not distinct as the user.id or location.id are a different combo for each row.

I've also tried working backwards with sub queries

SELECT * from USER where id = (
select id from GEO where area = (
select id, (long trig here) as distance) from GEO)

but that wont work as I have to select the trig statement as distance so I can't just just select the id from the GEO table

I'm at my wits end trying to get unique users; but still have it search in all the user locations. I know I could loop the results in php and rebuild them; however this query easily returns thousands of result since each users location is shown in the results and I'd rather not do that for speed purposes.

Any help in the right direction would be much appreciated..

ADDITION

to elaborate the result issue a tad, if you ran this query on ORLANDO with a radius that would extend to DAYTONA, if a user was in DAYTONA you'd get

USER | CITY
-----------
 1  | ORLAND
 1  | DAYTONA
 2  | ORLANDO
 3  | ORLANDO
 3  | DAYTONA

which results in duplicates of user 1 & 3

but when you group by user.id you only get

 USER | CITY
-----------
 2  | ORLANDO

which drops user 1 & 3 since when its grouped it only shows their area as DAYTONA

n.mitchell
  • 23
  • 3

1 Answers1

0

If you use WHERE instead of HAVING you would be able to use GROUP BY / DISTINCT and catch 'm all like so:

SELECT u.id AS USERID
    FROM `GEO` g
    JOIN `LOCATIONS` l ON l.`AREA` = g.`ID`
    JOIN `USER` u ON l.`ID` = u.`ID`
    WHERE (6371 * ACOS(COS(RADIANS({$lat})) * COS(RADIANS(g.latitude)) * COS(RADIANS(g.longitude) - RADIANS({$long})) + SIN(RADIANS({$lat})) * SIN(RADIANS(g.latitude)))) <= {$radius}
    GROUP BY u.`ID`

This may be optimized by using an 'early' pre-aggregated filter. I.e. by applying the WHERE on the ON as early as possible. Though this may look 'weird', it can be significantly faster. In your case this would look like this:

SELECT u.id AS USERID
    FROM `GEO` g
    JOIN `LOCATIONS` l ON 
        (6371 * ACOS(COS(RADIANS({$lat})) * COS(RADIANS(g.latitude)) * COS(RADIANS(g.longitude) - RADIANS({$long})) + SIN(RADIANS({$lat})) * SIN(RADIANS(g.latitude)))) <= {$radius}
        AND l.`AREA` = g.`ID`
    JOIN `USER` u ON l.`ID` = u.`ID`        
    GROUP BY u.`ID`
  • Note that if you'd want to select distance as well you can still put in in the select field list as you did; however, as you're if using DISTINCT you'll get just one, while if using GROUP BY you'd be able to concatenate all the distances
  • I'd recommend trying out both GROUP BY and DISTINCT as performance differences can be quite extreme and unpredictable. (see e.g. this question)
  • Just wondering, but it'd be more efficient to precalculate parts such as ACOS(COS(RADIANS({$lat})) instead of doing it on the fly, any reason to keep it like this?
  • Additionaly, you may want to store the long / lat values in radians for further optimization
Community
  • 1
  • 1
vollie
  • 1,005
  • 1
  • 11
  • 20
  • you my friend are amazing. the MAX kept kicking an invalid use of group by; so I removed MAX and did a select DISTINCT(u.id) and it appears to be working! As for precalculating, are you suggesting doing this first in my php function then passing? do you think that would show a decrease in search time? TBH that part is something I found on another post as the searching by RADIUS was over my head. The location table came from the US zipcode database, which had the plain lat / long. I could rebuild that with radians so it doesnt have to be done in the query; every fraction of a second helps. – n.mitchell Aug 25 '13 at 17:12
  • Exactly, a simple 'rebuild' so that the lon and lat in the db are stored in radians would spped it up (a bit) and make the query simpler as well. Also, the parts with a variable like `ACOS(COS(RADIANS({$lat}))` are - in the duration of yuur query - a constant, so you could ust use php to precalculate it and use the result of that e.g. `$pc_lat = acos(cos(deg2rad($lat)));`. – vollie Aug 25 '13 at 17:20
  • On the subject of MAX, you're quite right, that's an oversight on my part, but, for all i can see: you dont' need it as the MAX of a single value is always that value. I'll update the above accordingly – vollie Aug 25 '13 at 17:23
  • Commented before noticing your revision. As with my above comment, I still get the MAX error; but removing MAX and GROUP BY and replacing with DISTINCT(u.id) works ... Just out of curiosity, any idea why the MAX would give the invalid group by? – n.mitchell Aug 25 '13 at 17:23
  • Haha, i was still editing, now it has been updated. The MAX function is itself an Aggregate funcion, they're not allowed outside the select fields. See http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html – vollie Aug 25 '13 at 17:25
  • I don't think I'll need to CONCAT the distance, but GROUP seems to run faster than distinct so I think I'll use that as you show in your answer. Thanks for the suggestions for the calculations and radians, I plan to implement those. – n.mitchell Aug 25 '13 at 17:44
  • just an FYI, I rebuilt the db using deg2rad and the results are different. for a lat of 28.61 sql RADIANS returns 0.499338698995578. where deg2rad returns 0.49933869899558; which is one digit less, and fails to return the results. – n.mitchell Aug 27 '13 at 02:44
  • Mh, did not account for that one; then again, this should only affect precision (ever so slighlty) and speed. As such, results may theoretically be different but it should definately not fail. Check out http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates for some more info (it's the resouce i once used for implementing the very same thing). – vollie Aug 27 '13 at 03:17