0

So I'm trying to create a list of the 15 closest people in an array of varying sizes. The array will almost always be ~100 objects in size, but for the sake of testing, I'm trying to make it work with 10,000 (there may be need for the project to be scaled up to these numbers further down the line).

Currently, the method in place is to loop through the array of people, and calculate their proximity to the user based on both the person in question's and the user's latitude and longitude (the former of which is stored in the array). This is done using the haversine formulae and works well (though it does take ~500 milliseconds).

The problem however is that when run on a mobile device (a Samsung Galaxy S5 for the sake of this example), performance really suffers. The time taken for the S5 to sort through 10,000 records in order of how close they are to a pre-determined latitude and longitude is a staggering 1,500-1,600 milliseconds, an unacceptable delay for an app that will be doing many things either side of this process.

So my question is, am I missing some fundamentally more efficient means of sorting this list? Is there an alternative formulae available that is more efficient? Could I simply calculate the combined difference in Latitude and Longitude in .000001s and sort based on that?

Notes:

  1. The user's location is variable, so proximities cannot be stored

  2. I am aware that I'm asking a mobile CPU to perform 100,000,000 calculations in a short space of time and so this may be unavoidable

  3. The method of sorting is the native JavaScript sort method, below is a simplified version of what I am doing to test these timings:

patientArray.sort(function(a, b)
{
    return GetDistanceToPoint(a["Lat"], a["Lng"]) - GetDistanceToPoint(b["Lat"], b["Lng"]);
});

// Function to get the User's distance to a point
  function GetDistanceToPoint(Latitude, Longitude)
  {
   // Check if the User's current Latitude and Longitude are available
   if(currentLat && currentLng)
   {
    // Convert degrees to a radius
    function degreeToRadius(degree)
    {
     return degree * (Math.PI/180)
    }

    // Variable to store radius of the Earth in Km
    var earthRadius = 6371;
    
    // Calculate the distance between the two points
    var dLat = degreeToRadius(Latitude-currentLat);
    var dLon = degreeToRadius(Longitude-currentLng); 
    var a = Math.sin(dLat/2) * Math.sin(dLat/2) + Math.cos(degreeToRadius(currentLat)) * Math.cos(degreeToRadius(Latitude)) * Math.sin(dLon/2) * Math.sin(dLon/2); 
    var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); 
    var d = earthRadius * c;
    return d;
   }
   return "-1";
  }
Duncan McArdle
  • 501
  • 5
  • 14
  • Instead of your own loop/sort have you considered [Array.prototype.sort](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort)? If you give a sample of your data (3 or 4 entries) then we can show you how to use it, if you don't already know. – Xotic750 Jun 17 '15 at 13:32
  • I am currently using the native JS sort mechanism, apologies, I've now make that clearer in the OP (with some samples) :) – Duncan McArdle Jun 17 '15 at 13:44
  • 1
    By this description, you send an array of 10000 records to a mobile device and let it do the filtering client-side? Why not just do the whole calculation on the server? – blgt Jun 17 '15 at 13:45
  • 1
    There are a couple of things that you can do to improve the code you have shown, define `degreeToRadius(degree)` outside of `GetDistanceToPoint(Latitude, Longitude)`. Pre calculate `Math.PI/180)`. Only calculate `Math.sin(dLat/2` and `Math.sin(dLon/2)` once and store it in a variable for use. This will probably not make huge differences. – Xotic750 Jun 17 '15 at 13:50
  • @blgt Unfortunately this can only work on the client – Duncan McArdle Jun 17 '15 at 13:52
  • @Xotic750 Hmm, an interesting idea, I'll checkout what savings can be made doing this now, thanks :) – Duncan McArdle Jun 17 '15 at 13:53
  • @DuncanMcArdle Obviously, otherwise you wouldn't be asking. But if you provide a bit more context about the bizarre choice of architecture we might be able to provide more help – blgt Jun 17 '15 at 13:54
  • @Xotic750 Your ideas brought with them a 5-600ms saving when working on a 10,000 record array, unfortunately it doesn't solve the problem enough, but it certainly puts a dent in it, thank you :) – Duncan McArdle Jun 17 '15 at 14:03
  • @blgt Not sure what more context I can give really... user is given a list of x people at the start of a day and then periodically checks whom the closest 15 are at any point in that given day, without an internet connection. – Duncan McArdle Jun 17 '15 at 14:04
  • 1
    @DuncanMcArdle The array being immutable on a day-by-day basis means that the input data is only the user's coordinates, not the whole array. You could optimise by pre-processing it. This related question has several good ideas: http://stackoverflow.com/questions/1901139/closest-point-to-a-given-point – blgt Jun 17 '15 at 14:16
  • @blgt Thanks for the link. Although there a number of points in there, the concept of simply using "diffX^2 + diffY^2" to calculate the pre-squared distance actually interests me most of all. Would this be a reliable way to sort the data (after which point a real distance calculation could be performed to find the distance in a given unit)? – Duncan McArdle Jun 17 '15 at 14:32

1 Answers1

1

It all has to be tested but here are some ideas that I would try.

For heavy use of trigonometric functions you can use lookup tables. This is always a good idea. For example precompute 360 (or more) values of sin() and for every sin(radians) in your code use sinTable[degrees].

(I say 360 values as an example because with that your index is an angle in degrees but any value will do and it all depends on what precision you need - it can have thousands of values if needed.)

Avoid unnecessary calculations. May seem obvious but people often write something like x/(2*Math.PI) instead of x*A where A (a better name of course) is computed once as 1/(2*Math.PI).

Memoize every value that you can, if it makes sense.

If your data have some specific qualities, like for example never spanning half of the planet, then you can try to cheat a little bit and use coordinates on a flat plane - then you only have to compute square roots (which could also be precomputed to use lookup tables).

Those are the first things that crossed my mind. Hope it helps.

UPDATE:

You made an edit so I know a little bit more now. Here are my tips:

Don't convert degrees to radians. Keep degrees and use them as indexes in lookup tables of precomputed values of trigonometric functions. If you need more precision then multiply the degrees by 10 or something and use a scale from 0 to 3600 instead of 0 to 360. Find a good size/precision compromise that works for you.

You can eliminate all sin() and cos() calls that way and if you're lucky you can eliminate atan2(). I wouldn't worry so much about sqrt() but you can eliminate it too if you know what the values are typically going to be. If values of functions like sqrt() or atan2() are not known up fron then you can fall back to real functions for values that are out of range of your lookup tables.

Avoid to many function calls. Instead of an anonymous function that you pass to patientArray.sort(), that calls GetDistanceToPoint(), that calls degreeToRadius() - you need only one function that can be passed directly as an argument to .sort() and that function doesn't need to return d - it can return just c (in your example).

You don't need to multiply everything by earthRadius if you only use that value for sorting.

Another quick ideas: using typed arrays (for lookup tables), and asm.js and SIMD.js for additional optimization if possible.

Those are the first things that come to mind. I'd love to hear how much faster your code can get. Good luck.

UPDATE 2:

Another idea - instead of (or in addition to) optimizing the GetDistanceToPoint() you can also make sure that it isn't called more than once for every object.

Instead of:

patientArray.sort(function(a, b)
{
    return GetDistanceToPoint(a["Lat"], a["Lng"]) - GetDistanceToPoint(b["Lat"], b["Lng"]);
});

You can try doing something like:

patientArray.forEach(function (element) {
  element.distance = GetDistanceToPoint(element["Lat"], element["Lng"]);
});

or maybe for loop will be faster:

for (var i = 0; i < patientArray.length; i++) {
  var element = patientArray[i];
  element.distance = GetDistanceToPoint(element["Lat"], element["Lng"]);
}

to store the value for the entire patientArray array.

And then in the sort function:

patientArray.sort(function(a, b)
{
    return a.distance - b.distance;
});

It may hopefully save you a lot of calls to GetDistanceToPoint.

rsp
  • 107,747
  • 29
  • 201
  • 177
  • Thank you for the time you've taken to write all of this out! I've now tried a couple of your simpler suggestions such as removing the use of earth's radius (as you have rightly pointed out this is only for sorting, not accurate calculation), and have removed a couple of unnecessary calculations. Unfortunately, although I was able to remove one function to increase performance, removing all of them actually decreased it (perhaps because of a lack of caching). – Duncan McArdle Jun 17 '15 at 14:23
  • I will attempt to make use of your suggestion on a table of 360 values now, though if I'm honest my understanding in this area is slim, so I'll need to do some research. – Duncan McArdle Jun 17 '15 at 14:24
  • 1
    @DuncanMcArdle Now whan I look at this all more closely I think that you can also do one easy thing. Instead of using calling GetDistanceToPoint() in your sorting function, you can run it in a loop for all values, store the return value in an array, and use the elements of that array in your sorting function. There's a chance that GetDistanceToPoint is called much more than once for every value right now. I'll add that to the answer. – rsp Jun 17 '15 at 14:29
  • Ah I see, so perhaps perform the calculation on every record in the set, THEN sort by the distance? – Duncan McArdle Jun 17 '15 at 14:33
  • 1
    @DuncanMcArdle Yes. The sort function may call the GetDistanceToPoint() function many times for every element of patientArray before it sorts it. This way you can call it only once. See my updated answer. – rsp Jun 17 '15 at 14:41
  • Incredible, this is one of those times when I find myself thinking "How didn't I think of that!?". Thank you very much @rsp, I'll try this now. In addition, could you possibly enlighten me as to the difference between using the formulae discusses, as opposed to something as simple as "differenceX^2 + differenceY^2" for simply proximity calculation for sorting? – Duncan McArdle Jun 17 '15 at 14:46
  • 1
    A performance test of your original question code and some of the suggestions made in comments and the above answer. http://jsperf.com/30892888 – Xotic750 Jun 17 '15 at 15:03
  • @Xotic750 Thanks for that, I've already incorporated the suggestions to reduce timings significantly, I'm just interested in my previous comment to get it down as much as possible (if possible) so I can tie this off :) – Duncan McArdle Jun 17 '15 at 15:09
  • @rsp If you see this, I've upvoted your comments and will happily accept your answer once I know whether or not my recent comment on your answer is valid. Thank you for the time being though for what is already a huge change :) – Duncan McArdle Jun 17 '15 at 17:11
  • 1
    @DuncanMcArdle I said that simpler calculations could be used in some cases. For example let's say that your location are all in one city. If so then the Earth's curvature doesn't matter much and you could make calculations as if the Earth was flat. It would be simplest near the equator because the Lat and Lng would correspond to the same distance. On the north or south pole it would be harder, but still if you can project your coordinates on a flat map then you may be able to use Pythagorean theorem for calculating distance. See https://en.wikipedia.org/wiki/Map_projection for some good info. – rsp Jun 17 '15 at 23:12