1

I get a list of suburbs within a specified distance from a single location using Queries A.

I’m trying to adapt Queries A to get a list of suburbs surrounding location1, then get list of suburbs surrounding location2 and so on (I'll call this Queries B). Essentially Queries B is doing the same as Queries A, but repeating it for each separate location. My question- how can I do this using MySQL only. Suggestions on how to do this are much appreciated.


Here is a sample of the data I am working with. SqlFiddle here

CREATE TABLE `geoname` (
    `geonameid` INT(11) NOT NULL,
    `asciiname` VARCHAR(200) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `country` VARCHAR(2) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `latitude` DECIMAL(10,7) NULL DEFAULT NULL,
    `longitude` DECIMAL(10,7) NULL DEFAULT NULL,
    `fcode` VARCHAR(10) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `population` INT(11) NULL DEFAULT NULL,
    `area` INT(11) NULL DEFAULT NULL,
    PRIMARY KEY (`geonameid`),
    INDEX `asciiname` (`asciiname`),
    INDEX `country` (`country`),
    INDEX `latitude` (`latitude`),
    INDEX `longitude` (`longitude`),
    INDEX `fcode` (`fcode`),
    INDEX `population` (`population`),
    INDEX `area` (`area`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
;

INSERT INTO geoname(geonameid, asciiname, country, latitude, longitude, fcode, population, area) VALUES
(2147497, 'Tamworth', 'AU', -31.0904800, 150.9290500, 'PPL', 47597, 72),
(8597559, 'Tamworth', 'AU', -21.0457400, 143.6685200, 'PPL', 0, 0),
(8805708, 'Tamworth', 'AU', -21.0471300, 143.6692000, 'HMSD', 0, 0),
(2655603, 'Birmingham', 'GB', 52.4814200, -1.8998300, 'PPL', 984333, 599),
(4782167, 'Roanoke', 'US', 37.2709700, -79.9414300, 'PPL', 97032, 321),
(10114336, 'East Tamworth', 'AU', -31.0854800, 150.9372100, 'PPLX', 2621, 0),
(10114337, 'North Tamworth', 'AU', -31.0786200, 150.9221900, 'PPPL', 0, 0),
(2143940, 'West Tamworth', 'AU', -31.1023600, 150.9144700, 'PPLX', 0, 0),
(2656867, 'Aston', 'GB', 52.5000000, -1.8833300, 'PPLX', 0, 0),
(2646814, 'Hockley', 'GB', 52.5000000, -1.9166700, 'PPLX', 13919, 0),
(2650236, 'Edgbaston', 'GB', 52.4623000, -1.9211500, 'PPLX', 0, 0),
(4754994, 'Cumberland Forest', 'US', 37.1401300, -80.3217100, 'PPLX', 0, 0),
(4774999, 'Mountain Top Estates', 'US', 37.1376300, -80.3247700, 'PPPL', 0, 0),
(4764119, 'Highland Park', 'US', 37.2237400, -80.3917200, 'PPLX', 0, 0);

What I tried

Queries A- get suburbs surrounding a single point of interest

SELECT @lat := latitude, @lng :=longitude FROM geoname WHERE asciiname = 'Tamworth' and country='AU' and population>0 and fcode='PPL';

SELECT
    name as suburb, 'Tamworth' as point_of_interest, country,
    (
    (
    ACOS(SIN(@lat * PI() / 180) * SIN(latitude * PI() / 180) + COS(@lat * PI() / 180) * COS(latitude * PI() / 180) * COS((
    @lng - longitude
    ) * PI() / 180)) * 180 / PI()
    ) * 60 * 1.851999999962112
    ) AS distance
  FROM geoname
    WHERE fcode='PPLX' OR fcode='PPPL'
  HAVING distance <= '60'
  ORDER BY distance ASC;

RESULTS

The query above returns one location for the point of interest.

+---------------------------------+
|     @lat       |      @lng      |
+---------------------------------+
| 52.6339900     |   -1.6958700   |
+---------------------------------+

and a list of suburbs surrounding Tamworth.

    | point_of_interest |      suburb          | country |           distance |
    |-------------------|----------------------|---------|--------------------|
    |          Tamworth |  East Tamworth       |      AU | 0.9548077598752538 |
    |          Tamworth |  North Tamworth      |      AU | 1.4707125875055387 |
    |          Tamworth |  West Tamworth       |      AU |  1.915025922482298 |

I tried to create Queries B using MySQL user variables, GROUP_CONCAT() and FIND_IN_SET(). The idea was that I could cycle through the values a bit like using an array. I can post my last attempt if you wish, but I am not even close to a solution (not for lack of trying).

UPDATE: Here is one of my last attempts.

SELECT @lat := GROUP_CONCAT(latitude), @lng :=GROUP_CONCAT(longitude), @city :=GROUP_CONCAT(asciiname), @area :=GROUP_CONCAT(area) FROM geoname WHERE (asciiname = 'Tamworth' or asciiname = 'Birmingham' or asciiname = 'Roanoke') and population>0 and fcode='PPL';

SELECT
    FIND_IN_SET(asciiname, @city) as point_of_interest, asciiname as suburb, country,
    (
    (
    ACOS(SIN(FIND_IN_SET(latitude, @lat) * PI() / 180) * SIN(latitude * PI() / 180) + COS(FIND_IN_SET(latitude, @lat) * PI() / 180) * COS(latitude * PI() / 180) * COS((
    FIND_IN_SET(longitude, @lng) - longitude
    ) * PI() / 180)) * 180 / PI()
    ) * 60 * 1.851999999962112
    ) AS distance
  FROM geoname   
  HAVING distance <= FIND_IN_SET(distance, @area)
  ORDER BY distance ASC;

Desired Results for Queries B. For 3 points of interest-Tamworth, Birmingham and Roanoke- this is what I would expect to see.

| point_of_interest |      suburb          | country |           distance |
|-------------------|----------------------|---------|--------------------|
|          Tamworth |  East Tamworth       |      AU | 0.9548077598752538 |
|          Tamworth | North Tamworth       |      AU | 1.4707125875055387 |
|          Tamworth |  West Tamworth       |      AU |  1.915025922482298 |
|        Birmingham |        Aston         |      GB |  2.347111909955497 |
|        Birmingham |       Hockley        |      GB | 2.3581405942861164 |
|        Birmingham |      Edgbaston       |      GB |  2.568384753388139 |
|           Roanoke |    Cumberland Forest |      US |  36.66226789588173 |
|           Roanoke | Mountain Top Estates |      US |  37.02185777044897 |
|           Roanoke |        Highland Park |      US | 40.174566427830094 |

Suggestions on how to do this using MySQL are greatly appreciated.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
TryHarder
  • 2,704
  • 8
  • 47
  • 65
  • This hit tldr a long time ago. And at the end of it, we still wouldn't know what your data set looked like, nor your desired result. – Strawberry Jul 27 '16 at 07:03
  • @Strawberry I've updated the question with the points that you mentioned. Hopefully it will be a little better. – TryHarder Aug 01 '16 at 05:02

2 Answers2

3

You simply need to perform a self-join. Joining tables is a very fundamental part of SQL—you really should read up on it before trying to understand this answer further.

SELECT   poi.asciiname,
         suburb.asciiname,
         suburb.country,
         DEGREES(
           ACOS(
             SIN(RADIANS(   poi.latitude))
           * SIN(RADIANS(suburb.latitude))
           + COS(RADIANS(   poi.latitude))
           * COS(RADIANS(suburb.latitude))
           * COS(RADIANS(poi.longitude - suburb.longitude))
           )
         ) * 60 * 1.852 AS distance
FROM     geoname AS poi
    JOIN geoname AS suburb
WHERE    poi.asciiname IN ('Tamworth', 'Birmingham', 'Roanoke')
     AND poi.population > 0
     AND poi.fcode = 'PPL'
     AND suburb.fcode IN ('PPLX', 'PPPL')
HAVING   distance <= 60
ORDER BY poi.asciiname, distance

See it on sqlfiddle.

You'll have noticed that I've used MySQL's IN() operator as a shorthand for value = A OR value = B OR ....

You'll also have noticed that I've used MySQL's DEGREES() and RADIANS() functions rather than trying to perform such conversions explicitly.

You were then multiplying minutes of latitude by a factor of 1.851999999962112, which was rather strange: it's extremely close to 1.852, which is the precise number of kilometres in a nautical mile (historically defined as a minute of latitude), but yet bizarrely slightly different—I've assumed you meant to use that instead.

Finally, you had the literal value by which you were filtering the distances in the resultset as a string, i.e. '60', whereas obviously this is a numeric value and should be unquoted.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • why did you join using country? I mean, performance-wise it's probably smart to limit the number of join candidates, but technically op wants a cross join – Jakumi Aug 01 '16 at 12:18
  • @Jakumi: On reflection, I think it is right to remove that restriction—which I have just done in the update. – eggyal Aug 01 '16 at 12:36
  • @eggyal Thankyou!!! I’m glad you asked about 1.851999999962112. The first time I came across the formula for distance, the value used was different and unexplained. According to https://www.marketingtechblog.com/calculate-distance/ it should be 1.1515*1.609344 or 1.853159616 but after reading http://stackoverflow.com/a/389251/691053 quote-“the multiplier 1.1515 corresponds to the old definition of the nautical mile as approximately 6080 ft. Using bc with a scale of 10, I get: `(1852/(3*0.3048))/1760=1.1507794480`” So **1.851999999962112** is derived by `1.1507794480 * 1.609344` – TryHarder Aug 02 '16 at 01:12
  • @oooooo if the answer solves your problem, mark it appropriately please. – Jakumi Aug 02 '16 at 09:53
  • @Jakumi I will in good time. There's no need to rush. I marked the answer up, but I've not had time to go over the solution in detail yet. – TryHarder Aug 02 '16 at 13:23
1

Using Spatial Data Types.

Well first of all if you have a lot of geospatial data, you should be using mysql's geospatial extensions rather than calculations like this. You can then create spatial indexes that would speed up many queries and you don't have to write long drawn out queries like the one above.

Using a comparision with ST_Distance or creating a geometry with the radius of interest along with ST_within might give you good results and could be a lot faster than the current. However the best and fastest way to achieve this, ST_Dwithin isn't implemented yet in mysql.

These data types are available mysql 5.7 onwards but it's totally worth the effort to upgrade your DB if you are in an older version.

The new table structure.

CREATE TABLE `geoname2` (
    `geonameid` INT(11) NOT NULL,
    `asciiname` VARCHAR(200) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `country` VARCHAR(2) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `pt` POINT,
    `fcode` VARCHAR(10) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `population` INT(11) NULL DEFAULT NULL,
    `area` INT(11) NULL DEFAULT NULL,
    PRIMARY KEY (`geonameid`),
    INDEX `asciiname` (`asciiname`),
    INDEX `country` (`country`),
    INDEX `fcode` (`fcode`),
    INDEX `population` (`population`),
    INDEX `area` (`area`),
    SPATIAL INDEX `pt` (`pt`)
)COLLATE='utf8_unicode_ci'
ENGINE=InnoDB;

Notice that the latitude and longitude fields have been replaced by pt and their indexes have been replaced by a single index.

The new query A

SELECT asciiname as suburb, 'Tamworth' as point_of_interest, country,  
  ST_DISTANCE(`pt`, POINT(@lat,@lng)) as distance 
FROM geoname2     
WHERE (fcode='PPLX' OR fcode='PPPL') AND ST_DISTANCE(`pt`, POINT(@lat,@lng))  <= 1
ORDER BY distance ASC;

Clearly it's a lot simpler. It's probably faster too but with only 14 records to test on it's hard to reach any sort of conclusion, no index will be used for such small tables.

Note that ST_DISTANCE results are returned in degrees it's conventionally assumed that 1 degree is about 60 miles or 111 km (you have done so in your calculation)

BTW, In the existing setup, you do have an index on latitude and longitude but please note that mysql can use only one index per table so if you don't adopt geospatial queries you might want to convert that into a single composite index on latitude,longitude.

The full query.

Now the above query can be modified as follows to give the 'query B' in it's new form.

SELECT DISTINCT  g1.asciiname, g2.asciiname ,ST_DISTANCE(g1.pt, g2.pt) *111 as distance FROM geoname2 g1 
INNER JOIN (SELECT `pt`, asciiname  
    FROM geoname2 
     WHERE (fcode='PPLX' OR fcode='PPPL') AND 
       ST_DISTANCE(`pt`, POINT(@lat,@lng))  <= 1) as g2
WHERE ST_DISTANCE(g1.pt,g2.pt) < 1 
AND g1.asciiname != g2.asciiname ORDER BY distance ASC;

Note again that I am assuming 1 degree (approximately 111 km to be close to one another)

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • I've not had a chance to test the code yet, but I think the optimizations make a lot of sense. If I upgrade, I will definitely look at implementing something like this. Thank you. – TryHarder Aug 07 '16 at 13:00
  • Beware that MySQL's spatial extensions use Euclidean geometry, whereas the Haversine formula used in the question calculates great circle distance (spherical geometry). – eggyal Sep 26 '16 at 11:22