4

Goal


There are two goals:

  1. Sort latitude and longitude by distance
  2. Group the latitude and longitude based on a tolerance

Expected


Latitude and longitude are sorted in order of distance and grouped by tolerance.

Actual


Latitude and longitude are sorted in order of distance but the grouping does not feel right.

What I've tried


I've used the Haversine formula from several StackOverflow answers e.g. example 1, example 2, example 3, and managed to sort the locations by distance (goal #1 - I think ). I attempted to group them using a tolerance via Math.abs(lat - lastLat) < tolerance but I'm not confident it's working or flexible (goal #2).

Code


const locations = [
  { lat: 77.62279999, lng: 12.95248389 },
  { lat: 77.62517676, lng: 12.95027966 },
  { lat: 77.62753442, lng: 12.93745478 },
  { lat: 77.62217671, lng: 12.93353553 },
];

const distance = (lat1, lon1, lat2, lon2) => {
  const radlat1 = (Math.PI * lat1) / 180;
  const radlat2 = (Math.PI * lat2) / 180;

  const theta = lon1 - lon2;
  const radtheta = (Math.PI * theta) / 180;

  let dist =
    Math.sin(radlat1) * Math.sin(radlat2) +
    Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
  dist = Math.acos(dist);
  dist = (dist * 180) / Math.PI;
  dist = dist * 60 * 1.1515;
  dist = dist * 1.609344;

  return dist;
};

const sortedLocationsByDistance = locations.sort(function (a, b) {
  return distance(0, 0, a.lat, a.lng) - distance(0, 0, b.lat, b.lng);
});

console.log('Sorted:', sortedLocationsByDistance);

const groupedLocationsByTolerance = {};

const tolerance = 0.001;

sortedLocationsByDistance.forEach(({ lat, lng }, index) => {
  if (
    Object.keys(groupedLocationsByTolerance).length && 
    Math.abs(lat - sortedLocationsByDistance.slice(-1)[0].lat) < tolerance &&
    Math.abs(lng - sortedLocationsByDistance.slice(-1)[0].lng) < tolerance
  ) {
    groupedLocationsByTolerance[Object.keys(groupedLocationsByTolerance).slice(-1)[0]].push({ lat, lng });
    return;
  }

  groupedLocationsByTolerance[index] = groupedLocationsByTolerance[index] || [];
  groupedLocationsByTolerance[index].push({ lat, lng });
});

console.log('Grouped:', groupedLocationsByTolerance);
Paddy
  • 1,175
  • 1
  • 14
  • 23

2 Answers2

1

This is the solution I ended up at. Please feel free to tweak and refine it as you see fit (post your own to help others). I realise there are a number of ways you could achieve it.

What I've tried


As seen above, the first approach used a center point (0, 0) and measured the distance to each pin via Haversine formula. The next solution used an algorithm that calculated the angle of the ray between each point using trigonometric functions (a big thank you to this answer).

const getSortedByDistance = (
  coordinates,
  centerPoint
) => {
  const points = [...coordinates]; // Copy the array, sort modifies it
  const angles = {};

  // Calculate the angle of the ray between that center point and each point, using inverse trigonometric functions
  points.forEach((point) => {
    angles[getCoordinateKey(point.latitude, point.longitude)] = Math.atan2(
      point.latitude - centerPoint.latitude,
      point.longitude - centerPoint.longitude
    );
  });

  // Filter the order based on the angle
  points.sort(
    (pointA, pointB) =>
      angles[getCoordinateKey(pointA.latitude, pointA.longitude)] -
      angles[getCoordinateKey(pointB.latitude, pointB.longitude)]
  );

  return points;
};

While this grouped coordinates fairly well, it did not guarantee they were in the correct order; making the last group check fail in some circumstances.

What I did


After a bit of thinking, the best option is to use the Haversine formula to measure the distance pin-to-pin; creating accurate grouping. The only downside to this is the performance cost for the full calculation.

const getDistance = (lat1, lon1, lat2, lon2) => {
    const radlat1 = (Math.PI * lat1) / 180;
    const radlat2 = (Math.PI * lat2) / 180;

    const theta = lon1 - lon2;
    const radtheta = (Math.PI * theta) / 180;

    let dist =
        Math.sin(radlat1) * Math.sin(radlat2) +
        Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
    dist = Math.acos(dist);
    dist = (dist * 180) / Math.PI;
    dist = dist * 60 * 1.1515;
    dist = dist * 1.609344;

    return dist;
};

const getGroupsByDistance = (coordinates, tolerance) => {
    const coordinatesToleranceMap = Array(coordinates.length).fill(1); // 1 = true

    return coordinates.reduce(
        (groups, coordinate, index) => {
            // The tolerance map contains true/false (0/1) values, ignore anything with 0
            if (coordinatesToleranceMap[index] === 0) return groups;

            const { latitude, longitude } = coordinate;

            // This will return the current listing because it's the same lat/lng. We don't need to check it's length because it always has one
            const coordinatesWithinTolerance = coordinates.filter(
                (coordinate, index) => {
                    // We're not interested in any listing that have been filtered out already
                    if (coordinatesToleranceMap[index] === 0) {
                        return false;
                    }

                    const isSameLatLng =
                        latitude === coordinate.latitude &&
                        longitude === coordinate.longitude;

                    // Measure distance using Haversine formula
                    const isWithinTolerance =
                        isSameLatLng ||
                        getDistance(
                            latitude,
                            longitude,
                            coordinate.latitude,
                            coordinate.longitude
                        ) <= tolerance;

                    // Ignore the current listing on the next filter
                    if (isWithinTolerance) coordinatesToleranceMap[index] = 0;

                    return isWithinTolerance;
                }
            );

            groups.push({
                latitude,
                longitude,
                properties: coordinatesWithinTolerance
            });

            return groups;
        },
        []
    );
};

const locations = [
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 78.62217671, longitude: 12.93353553 },
  { latitude: 77.62517676, longitude: 12.95027966 },
  { latitude: 77.62753442, longitude: 12.93745478 },
  { latitude: 77.62752442, longitude: 12.93745478 },
  { latitude: 77.62214671, longitude: 12.93353553 },
];

const tolerance = 0.01;

const t0 = performance.now();
const groupsByDistance = getGroupsByDistance(locations, tolerance);
const t1 = performance.now();

console.log('Time:', t1 - t0);
console.log(groupsByDistance);

Performance


The function will take longer to execute as the dataset grows. It needs chunking down in order to reduce the execution time. Alternatively, reducing the amount of looping could lead to a significant improvement.

Improvements


If anybody can come up with a better answer (improvement/new logic) that is performant, please add it below.

Paddy
  • 1,175
  • 1
  • 14
  • 23
1

For very large sets, the number of iterations can be reduced significantly by a combination of both(!) of your answers. The logic is thus...

  • O( n ) - Calculate the distance of every lat/long from a fixed point. In the example below, the origin lat/long of ( 0, 0 ) is used as the fixed point.

  • O( n log n ) - Sort this array by ascending distance.

  • O( nm ) - Now, similar to a bubble sort, walk this array with a nested pair of loops such that while the inner loop distance from origin is still within the tolerance of the outer loop distance from origin, perform the expensive getDistance check to determine if the coordinates are indeed within tolerance. Continue in the inner loop while the distance from origin difference is still within the tolerance, otherwise break from the inner loop, as there is no sense in comparing the distances now known to be further apart than the tolerance. In this way, the algorithm is only checking a handful of distances that are within range of the tolerance.

The "m" in O( nm ) is the average number of locations within tolerance, and for large numbers of locations, should be relatively small compared to "n", assuming some distribution of the locations and a small tolerance. For large tolerances, of course, "m" will approach "n" and then the algorithm is performing at O( n^2 )...

const locations = [
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 78.62217671, longitude: 12.93353553 },
  { latitude: 77.62517676, longitude: 12.95027966 },
  { latitude: 77.62753442, longitude: 12.93745478 },
  { latitude: 77.62752442, longitude: 12.93745478 },
  { latitude: 77.62214671, longitude: 12.93353553 },
];

const getDistance = (lat1, lon1, lat2, lon2) => {
    const radlat1 = (Math.PI * lat1) / 180;
    const radlat2 = (Math.PI * lat2) / 180;

    const theta = lon1 - lon2;
    const radtheta = (Math.PI * theta) / 180;

    let dist =
        Math.sin(radlat1) * Math.sin(radlat2) +
        Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
    dist = Math.acos(dist);
    dist = (dist * 180) / Math.PI;
    dist = dist * 60 * 1.1515;
    dist = dist * 1.609344;

    return dist;
};


for ( let loc of locations ) {
  loc.distanceFromOrigin = getDistance( loc.latitude, loc.longitude, 0, 0 );
}

console.log( 'Sorted by closest to origin:' );
locations.sort( ( a, b ) => a.distanceFromOrigin - b.distanceFromOrigin );
console.log( locations );

let tolerance = 0.01;
let results = [];

for ( let i = 0; i < locations.length; i++ ) {
  let loci = locations[ i ];
  for ( let j = i + 1; j < locations.length; j++ ) {
    let locj = locations[ j ];
    if ( locj.distanceFromOrigin - loci.distanceFromOrigin <= tolerance ) {
      if ( getDistance( loci.latitude, loci.longitude, locj.latitude, locj.longitude ) <= tolerance ) {
        loci.properties = loci.properties ?? [];
        loci.properties.push( locj );
      }
    } else {
      break;
    }
  }
  if ( loci.properties ) {
    results.push( loci );
  }
}

console.log( `Closest by tolerance of ${tolerance}:` );
console.log( results );
    
Trentium
  • 3,419
  • 2
  • 12
  • 19
  • This is some interesting/quality work! Thank you for replying. I've noticed the tolerance doesn't seem to group multiple properties. Is that right? – Paddy Oct 10 '22 at 16:59
  • 1
    I took the liberty of excluding the location from its own property list, and hence the results (based on the sample data) only show one grouped property. If in the inner loop you change `j = i + 1` to `j = i`, then the inner loop will begin by comparing the location against itself, and the properties will then include itself, and the results will then match what you've been seeing... – Trentium Oct 10 '22 at 17:08
  • Thanks again for this answer. I've had the chance to try it and it does reduce the number of iterations by around 50-55%. Would you recommend the traditional for loop over a reduce/filter? I know the old method is meant to be quicker but after running some `performance.now()` tests it varies between implementation method. I like the readability of the newer syntax. – Paddy Oct 12 '22 at 16:31
  • 1
    Usually when I propose an answer involving an algorithm, I use the traditional looping structures simply for readability purposes, and in particular for clearer interpretation of the Big O notation. Reduce/Filter will also do the trick, but be mindful of the current short circuiting of the inner loop which is the source of the time savings. Ie, if a filter is used where the inner loop is, it will scan the entire list of locations to find those where the `distanceFromOrigin` is in tolerance, and on top of that, create a new array(!)... – Trentium Oct 12 '22 at 19:06