7

I have the following SQL:

select code, distance from places;    

The output is below:

CODE    DISTANCE            LOCATION
106     386.895834130068    New York, NY
80      2116.6747774121     Washington, DC
80      2117.61925131453    Alexandria, VA
106     2563.46708627407    Charlotte, NC

I want to be able to just get a single code and the closest distance. So I want it to return this:

CODE    DISTANCE            LOCATION
106     386.895834130068    New York, NY
80      2116.6747774121     Washington, DC

I originally had something like this:

SELECT code, min(distance), location
GROUP BY code
HAVING distance > 0 
ORDER BY distance ASC

The min worked fine if I didn't want to get the correct location that was associated with the least distance. How do I get the min(distance) and the correct location (depending on the ordering on the inserts in the table, sometimes you could end up with the New York distance but the Charlotte in Location).

cdub
  • 24,555
  • 57
  • 174
  • 303
  • 2
    Telling the DBMS up front woulda been nice.... – ErikE Jul 27 '12 at 08:24
  • chris, why you wonder for performance so much under each answer? wouldn't you execute proposed queries once and buffer the results in order to obtain simple `code 1:1 closest location` relationship? as far as I'm concerned distances between codes and locations do not change very often... – Kuba Wyrostek Jul 27 '12 at 08:41

3 Answers3

9

To get the correct associated location, you'll need to join a subselect which gets the minimum distance per code on the condition that the distance in the outer main table matches with the minimum distance derived in the subselect.

SELECT a.code, a.distance
FROM   places a
INNER JOIN
(
    SELECT   code, MIN(distance) AS mindistance
    FROM     places
    GROUP BY code
) b ON a.code = b.code AND a.distance = b.mindistance
ORDER BY a.distance
Zane Bien
  • 22,685
  • 6
  • 45
  • 57
  • how's the performance on this for like 100,000s of locations? – cdub Jul 27 '12 at 08:31
  • @chris, since you're using MySQL, this is likely the most efficient solution you will find. You'll have to make sure you have the proper indexes set up on `code` and `distance` fields. – Zane Bien Jul 27 '12 at 08:34
  • yeah code is a PK so that's okay, but distance is calcualted with math (using lat and long and google's map api) – cdub Jul 27 '12 at 08:37
  • distance is calculated like this: – cdub Jul 27 '12 at 08:40
  • 3959 * acos( cos( radians(".$lat.") ) * cos( radians( TeacherLocation.latitude ) ) * cos( radians( TeacherLocation.longitude ) - radians(".$long.") ) + sin( radians(".$lat.") ) * sin( radians( TeacherLocation.latitude ) ) ) ) AS distance – cdub Jul 27 '12 at 08:40
  • the $lat and $long are passed in to the query, so do I have to have this calculating done in both places in your sql (for a.distance and b.mindistance OR can I just do it in a.distance? – cdub Jul 27 '12 at 08:41
  • @chris Okay, if that's calculated on the fly, and you're trying to get the minimum of that calculation: what you may want to do to make this much faster is to create an additional DECIMAL column called `distance` which would store the result of the calculation (either done in PHP beforehand or via a mass UPDATE). That way, you'll be able to set up an index on the distance and use that column instead. – Zane Bien Jul 27 '12 at 08:43
  • @chris You'll actually need to do it in all places where you're making the distance comparison (the join, and within `MIN`). – Zane Bien Jul 27 '12 at 08:46
  • @chris, no solution is likely to be fast if you're making comparisons on a ***calculated*** value in a table with ~100,000 rows. Consider the optimization technique I suggested above. – Zane Bien Jul 27 '12 at 08:49
  • and consider that if you are looking for a closest location for each code (and that's what you do) - you only need to calculate results using Zane's query only once and forever. – Kuba Wyrostek Jul 27 '12 at 08:51
  • i am trying to do something like finding the closest person too you on the fly, since users keep joining and adding new locations, how do you not keep this from be calculated on the fly, plus my location changes regularly to (which is the input to the query) – cdub Jul 27 '12 at 08:53
  • 1
    @chris, if performance becomes suboptimal for your needs, you may want to look into using a [spatial database](http://en.wikipedia.org/wiki/Spatial_database) instead of a relational one. Relational databases will only get you so far with these types of queries, but spatial databases are much more geared towards them. – Zane Bien Jul 27 '12 at 08:59
  • having some trouble still, lets say distance and code where in separat tables, like User.code has many Locations.id and Locations.distance, how would the joins work? – cdub Jul 27 '12 at 09:08
  • @chris, I can't really help you much without knowing what your table schema / structure looks like. Please post a schema. Thanks. – Zane Bien Jul 27 '12 at 09:12
  • 2
    Use a "bounding box" to get a shorter candidate list, then use the distance function. That lets you exclude most of the 100,000. – ErikE Jul 27 '12 at 16:00
  • @ErikE I cannot +1 your comment enough! I had to do this recently and it that is exactly how I did it. See the article [here](http://www.movable-type.co.uk/scripts/latlong-db.html). I would also recommend to let the user increase the "bounding box" area in case there are no returns within the default return. For instance if default is 100 miles and there are no locations within 100 miles than have a button to click and calculate for say 200 miles... – amaster Oct 11 '13 at 16:02
  • @amaster507 Thank you! I looked at your page, and notice that your code isn't allowing for the curvature of the Earth (and yes, it really matters). Please see [this blog post](http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-zipcode-latitude-longitude-pr) for a formula. – ErikE Oct 11 '13 at 22:58
0

You can try to do a nested lookup between the minimum grouping and the original table.

This seems to do the trick

SELECT MinPlaces.Code, MinPlaces.Distance, Places.Location 
FROM Places INNER JOIN
(
    SELECT Code, MIN(Distance) AS Distance
    FROM Places
    GROUP BY Code
    HAVING MIN(Distance) > 0 
) AS MinPlaces ON Places.Code = MinPlaces.Code AND Places.Distance = MinPlaces.Distance
ORDER BY MinPlaces.Distance ASC

UPDATE: Tested using the following:

DECLARE @Places TABLE ( Code INT, Distance FLOAT, Location VARCHAR(50) )

INSERT INTO @Places (Code, Distance, Location)
VALUES
(106, 386.895834130068, 'New York, NY'),
(80, 2116.6747774121, 'Washington, DC'),
(80, 2117.61925131453, 'Alexandria, VA'),
(106, 2563.46708627407, 'Charlotte, NC')

SELECT MinPlaces.Code, MinPlaces.Distance, P.Location 
FROM @Places P INNER JOIN
(
    SELECT Code, MIN(Distance) AS Distance
    FROM @Places
    GROUP BY Code
    HAVING MIN(Distance) > 0 
) AS MinPlaces ON P.Code = MinPlaces.Code AND P.Distance = MinPlaces.Distance
ORDER BY MinPlaces.Distance ASC

And this yields:

enter image description here

tobias86
  • 4,979
  • 1
  • 21
  • 30
0

You did not say your DBMS. The following solutions are for SQL Server.

WITH D AS (
   SELECT code, distance, location,
      Row_Number() OVER (PARTITION BY code ORDER BY distance) Seq
   FROM places
)
SELECT *
FROM D
WHERE Seq = 1

If you have a table with unique Codes, and an index in your Places table on [Code, Distance] then a CROSS APPLY solution could be better:

SELECT
   X.*
FROM
   Codes C
   CROSS APPLY (
      SELECT TOP 1 *
      FROM Places P
      WHERE C.Code = P.Code
      ORDER BY P.Distance
   ) X

I cannot work on a solution for mysql unti much later.

P.S. You cannot rely on insertion order. Do not try!

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • what do you mean unique codes, as the sample i gave has duplicate codes – cdub Jul 27 '12 at 08:13
  • If you have a separate table listing all the Codes with 1 row per code! – ErikE Jul 27 '12 at 08:15
  • yes i know i can't rely on insertion order. anyway my code is actual a user.id which comes from a user table and links to a locations table with distance and location – cdub Jul 27 '12 at 08:16
  • then yes there are unique and they have a one to many relationship with locations table – cdub Jul 27 '12 at 08:18
  • is it in mysql and is the performance fast as i'll have 100,000s of locations – cdub Jul 27 '12 at 08:19