3

This is a question about calculating the distance between two points of latitude and longitude on the earth using a haversine formula, for use in projects where you need to have a 'find my nearest' function.

The haversine formula is well discussed and solved in MySQL in this post.

I then asked this question about turning it into a stored function so that its available for future projects without having to lookup, remember or re-type the formula in its long form.

Its all good. Except my function differs in results (slightly) to just typing the formula directly into the query, all other things being equal. Why is this?

So here's the function I wrote:

DELIMITER $$

DROP FUNCTION IF EXISTS haversine $$

CREATE FUNCTION `haversine`
    (fromLatitude FLOAT,
     fromLongitude FLOAT,
     toLatitude FLOAT,
     toLongitude FLOAT,
     unit VARCHAR(20)
     )
    RETURNS FLOAT
    DETERMINISTIC    
    COMMENT 'Returns the distance on the Earth between two known points of longitude and latitude'
    BEGIN
    DECLARE radius FLOAT;
    DECLARE distance FLOAT;

    IF unit = 'MILES' THEN SET radius = '3959';
    ELSEIF (unit = 'NAUTICAL_MILES' OR unit='NM') THEN SET radius = '3440.27694';   
    ELSEIF (unit = 'YARDS' OR unit='YD') THEN SET radius = '6967840';
    ELSEIF (unit = 'FEET' OR unit='FT') THEN SET radius = '20903520';
    ELSEIF (unit = 'KILOMETRES' OR unit='KILOMETERS' OR unit='KM') THEN SET radius = '6371.3929';
    ELSEIF (unit = 'METRES' OR UNIT='METERS' OR unit='M') THEN SET radius = '6371392.9';
    ELSE SET radius = '3959'; /* default to miles */
    END IF;

    SET distance = (radius * ACOS(COS(RADIANS(fromLatitude)) * COS(RADIANS(toLatitude)) * COS(RADIANS(toLongitude) - RADIANS(fromLongitude)) + SIN(RADIANS(fromLatitude)) * SIN(RADIANS(toLatitude))));

    RETURN distance;
    END$$

DELIMITER ;

Here's a set of test queries set to find the distance between the London Eye and Buckingham Palace, just for an example. Obviously normally you'd substitute the destination with fields from your database of geo-located 'things' that you want to compare with.

SET @milesModifier = 3959;

SET @myLat = 51.503228;
SET @myLong = -0.119703;

SET @destLat = 51.501267;  
SET @destLong = -0.142697;

SELECT  @kilometerModifier AS radius,
    @myLat AS myLat,
    @myLong AS myLong,
    @destLat AS destLat,
    @destLong AS destLong,
    (@milesModifier * ACOS(COS(RADIANS(@myLat)) * COS(RADIANS(@destLat)) * COS(RADIANS(@destLong) - RADIANS(@myLong)) + SIN(RADIANS(@myLat)) * SIN(RADIANS(@destLat)))) AS longFormat,
    haversine(@myLat,@myLong,@destLat,@destLong,'MILES') AS distanceMiles,
    haversine(@myLat,@myLong,@destLat,@destLong,'NAUTICAL_MILES') AS distanceNautical,
    haversine(@myLat,@myLong,@destLat,@destLong,'KM') AS distanceKm,
    haversine(@myLat,@myLong,@destLat,@destLong,'METRES') AS distanceMetres,    
    haversine(@myLat,@myLong,@destLat,@destLong,'YARDS') AS distanceYards,
    haversine(@myLat,@myLong,@destLat,@destLong,'FEET') AS distanceFeet,
    haversine(@myLat,@myLong,@destLat,@destLong,'') AS distanceDefault

In the example, we're using miles - so we've set the radius (@milesModifier in the test, radius in the function) to 3959 exactly.

The result I got back was interesting (on MySQL 5.2.6 community edition), highlights:

| longFormat       | distanceMiles   |
|------------------|-----------------|
| 0.99826000106148 | 0.9982578754425 |

longFormat is the maths done in the query, distanceMiles is the result of the function.

The results are different... OK, so its an insignificance as far as using the function in a project, but i'm interested to know how the same formula inside or outside of the function have different results.

I'm guessing that its to do with lengths of the FLOAT - they're not specified in the function, I have tried specifying them (right up to 30,15) to give plenty of room for all the figures I have and the output I expect - but the results still differ slightly.

Community
  • 1
  • 1
Codecraft
  • 8,291
  • 4
  • 28
  • 45
  • The FLOAT data type is approximate. Have you tried it with a DECIMAL data type instead? http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html – Mike Jun 23 '11 at 16:40
  • Well you learn something new every day... Whats the advantage of the approximation, does it make the calculations quicker? – Codecraft Jun 23 '11 at 17:28
  • It's to do with [data storage requirements](http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html). See also [Problems with Floating-Point Values](http://dev.mysql.com/doc/refman/5.1/en/problems-with-float.html). I've just tried it with `DECIMAL(30,15)` and got the same result from both calculations. You might be able to tweak that to suit your exact requirements. – Mike Jun 23 '11 at 17:51
  • Great stuff, as I said... something new learnt every day! If you want to make this an official answer i'll happily accept it. – Codecraft Jun 23 '11 at 18:28

1 Answers1

5

FLOAT is an approximate data type - see:

Problems with Floating-Point Values
Numeric Types

Try changing FLOAT to DECIMAL(30,15) to ensure that you have the correct precision.

If you want an in-depth discussion of floating point, you could try this article:

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Mike
  • 21,301
  • 2
  • 42
  • 65
  • Look. There is no point in using more precision that offered by the float data type (32-bit floating point) for latitude-longitude work. Why? Because the approximation that the earth is a sphere, the approximation used by the haversine great-circle formula, starts to break down when you work with distances less than a meter or so. And 32-bit floating gives you more precision than that. – O. Jones Jul 12 '11 at 20:24
  • The difference in distance between the two results in the question is 135 mils (0.135 inches). If you're a civil engineer planning drains, this might be significant. Otherwise it isn't. – O. Jones Jul 12 '11 at 20:30
  • @Ollie Jones: From the OP's question "*The results are different... OK, so its an insignificance as far as using the function in a project, but i'm interested to know how the same formula inside or outside of the function have different results.*" So we know that the precision is more than required in this situation, but the level of precision was not the question. The OP wanted to know why the results were different, and the reason was the use of FLOAT. – Mike Jul 12 '11 at 20:50
  • @Ollie Jones: From my comments on the OP's question: "*I've just tried it with DECIMAL(30,15) and got the same result from both calculations. You might be able to tweak that to suit your exact requirements.*" Adjusting the level of precision has therefore already been suggested. – Mike Jul 12 '11 at 20:55