4

Is there a way to run in Scala a geospatial query, given a set of lat/lon coordinates, to find nearest by distance? The query needs to run in memory possibly.

The set of values is roughly 1 million lon/lat coordinates. I am trying to do that in Spark but the only solution I have found is Magellan but I cannot make it even work for Spark 1.6 and Scala 2.11 so I am trying customized solution.

Example of query: Given one point in wgs84 coordinates and the 1 million set of wsg84 coords, I want the nearest 15 coords in a radius of one mile.

marcospereira
  • 12,045
  • 3
  • 46
  • 52
Randomize
  • 8,651
  • 18
  • 78
  • 133

2 Answers2

1

Here is a library with RTree implemetation that can be used for indexing of geo data in Scala: https://github.com/davidmoten/rtree

Just select by bounding box rectangle(s) for your point which will be center of a circle with given radius (distance in your case) and then filter points by the distance to cut out false positives in corners of bounding boxes and then sort results by already calculated distance to take required the nearest 15.

You can use the ‘haversine’ formula to check distance condition between points (see description here http://www.movable-type.co.uk/scripts/latlong.html):

import java.lang.Math._
import com.github.davidmoten.rtree.geometry.{Point, Rectangle}
import com.github.davidmoten.rtree.geometry.Geometries._

def distance(p1: Point, p2: Point): Double = {
  val radLon1 = toRadians(p1.x)
  val radLat1 = toRadians(p1.y)
  val radLon2 = toRadians(p2.x)
  val radLat2 = toRadians(p2.y)
  val x = sin((radLon2 - radLon1) * 0.5)
  val y = sin((radLat2 - radLat1) * 0.5)
  val a = y * y + cos(radLat1) * cos(radLat2) * x * x
  atan2(sqrt(a), sqrt(1 - a)) * 12756274 // The Earth diameter in meters
}

For calculation of bounding boxes use following function:

def boundingRectangles(c: Point, r: Double): List[Rectangle] = {
  val radLon = toRadians(c.x)
  val radLat = toRadians(c.y)
  val radDist = r / 6378137 // The Earth radius in meters
  val lat1 = toDegrees(radLat - radDist)
  val lat2 = toDegrees(radLat + radDist)
  if (lat1 > -90 && lat2 < 90) {
    val deltaLon = asin(sin(radDist) / cos(radLat))
    val lon1 = toDegrees(radLon - deltaLon)
    val lon2 = toDegrees(radLon + deltaLon)
    if (lon1 < -180) rectangle(-180, lat1, lon2, lat2) :: rectangle(lon1 + 360, lat1, 180, lat2) :: Nil
    else if (lon2 > 180) rectangle(-180, lat1, lon2 - 360, lat2) :: rectangle(lon1, lat1, 180, lat2) :: Nil
    else rectangle(lon1, lat1, lon2, lat2) :: Nil
  } else rectangle(-180, max(lat1, -90), 180, min(lat2, 90)) :: Nil
}

List of rectangles required for case when a circle is crossed by the date change meridian, because the RTree doesn't support wrapping of geo-coordinates over the Earth, so we split that rectangles on two by the date change meridian.

Formula and description are here http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates#Longitude

EDIT: Finally we ended up to have our own version of the immutable RTree with STR packing that is tuned for efficient window and knn queries on both plane and spherical geometries:

https://github.com/plokhotnyuk/rtree2d

Andriy Plokhotnyuk
  • 7,883
  • 2
  • 44
  • 68
0

if you want arbitrary datums then you probably need a library but if it is just distance in wgs84 it is a straight forward formula see for example the response to Calculate distance in meters when you know longitude and latitude in java

Community
  • 1
  • 1
Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68