0

I'm building an online Symfony application, and as part of the developement process i've been tasked with sorting an amount of database records based on the distance from the logged user; said user can widen the search radius at will, up to the size of the whole world.

At any moment i have access to the GPS coordinates of the logged user, and on a database table i saved latitude and longitude of the various points of interest.

Currently, there are only 400 records in the POIs' table, but due to the amount of data i must extract whenever i access it, the query time is already slightly over a second. Adding 400 trigonometric functions to such workload will soon bring such execution time beyond acceptability.

I thus need a method that's both fast and accurate to calculate such distances;

I've read multiple articles suggesting the Haversine formula, but i found that to be too slow for my needs and even an extensive article like this couldn't be of any help;

Considering that i could soon reach thousands of POIs with thousands of users logged at the same time from all over the world, how could i approach (and hopefully solve) such problem?

I'm using PHP 7.0, Symfony 3.2, and Doctrine; pdo to interface to a Mysql server, with innoDB as the database engine
My customer values accuracy over speed, but can't stand to wait more than 5 seconds
The query results are paged, thus delegating the sorting to the client is impossible
Both the database and the php server share the same (terrible) resource pool, and such pool is to be shared with other applications

On a sidenote, some of the POIs may expire after a certain date

JValker
  • 374
  • 2
  • 14
  • I am not sure `Haversine` is your performance issue, we use this in production and we do over 150k searches a minute ( well it does use multiple PHP workers ), but have you actually bench marked how long it takes to calculate. – ArtisticPhoenix Jul 31 '17 at 15:33
  • here is a question with it in PHP code https://stackoverflow.com/questions/14750275/haversine-formula-with-php – ArtisticPhoenix Jul 31 '17 at 15:33
  • there is a way of calculating it in MySQL, but it's not accurate, and slower. – ArtisticPhoenix Jul 31 '17 at 15:34
  • @ArtisticPhoenix i only benchmarked trought symfony's profiler, and the already high query time got about 5% higher when i started factoring in the distance calculation and sorting; – JValker Jul 31 '17 at 15:37
  • I don't use symphony only code my own stuff, but you can borrow my benchmark class https://github.com/ArtisticPhoenix/Evo/blob/master/Evo/Benchmark.php you just do `$mark = Benchmark->getInstance()->mark()` and then after you do `Benchmark->getInstance()->format($mark)` – ArtisticPhoenix Jul 31 '17 at 15:40
  • @ArtisticPhoenix thanks a bunch for your benchmark class. Apparently, you were right; the query without the distance calculation actually takes 83ms, instead of the whole second reported by Symfony's profiler, and a measly 85 after adding the Haversine formula If you'd want to add an answer, i'd be glad to mark it as the accepted one – JValker Jul 31 '17 at 16:20
  • A design that has worked well for me goes something like this... (1) Add a bit of space around the lat/lon pairs by increasing an decreasing the geocode values, (2) SELECT everything in those points into a temporary table ENGINE=MEMORY, (3) compute the distances using the values in the temporary table, then SELECT from the temporary table, ORDER BY distances. Since I did the calculation in PHP, it did not seem to matter much whether I used plain geometry or Haversine. IIRC, Haversine seemed to be a bit more accurate above about 100 miles. – Ray Paseur Jul 31 '17 at 16:43
  • @Vkfan - added it. Thanks! – ArtisticPhoenix Jul 31 '17 at 17:40

1 Answers1

0

You asked me to add it so I will.

Are you sure the performance hit is from Haversine? We have used a PHP implementation of this formula in production at my work successfully for about 2 years and we do a high volume of searches ( around 150k a minute at peak times ).

I cant get into the details about my work but i can say we use a combination of sphinx, mongoDB, mysql and RabbitMq.

In any case, both sphinx and mysql suffer from a poor implementation of distance calculations losing about 2 miles in accuracy at 100 miles distance.( this is why we use it )

One thing you can do is to benchmark the time it takes to run the Haversine formula, good benchmarking is the first step when you have performance issues.

While i am not a symphony user, I do have a class i made for just this thing. It's part of a larger framework I am building in my spare time (Evolution). You can get the class here

https://github.com/ArtisticPhoenix/Evo/blob/master/Evo/Benchmark.php

It's very simple to use

$mark = Benchmark::getInstance()->mark();

... code to time ...

echo Benchmark::getInstance()->format($mark);

And will output something like

10 milliseconds
5 minutes 3 milliseconds
ect..

It's designed so you can use multiple marks

$mark = Benchmark::getInstance()->mark();

... code to time ...

$mark1 = Benchmark::getInstance()->mark();

 ... more code to time ...

echo "TotalTime: ".Benchmark::getInstance()->format($mark);
echo "MethodTime: ".Benchmark::getInstance()->format($mark1);

etc..

It basically just records the microtime(true) (true is as float) when you call mark() and returns a identifier $mark then if you call mark($mark) with the identifier it will subtract that from the current microtime(true). Calling format($mark) just makes it more "Human" readable.

Hope it helps you!

ArtisticPhoenix
  • 21,464
  • 2
  • 24
  • 38