efficient computation of haversine distance between elements of collections

Question

I have two collections. Each collection is comprised of a collection containing a latitude, longitude, and epoch.

val arr1= Seq(Seq(34.464, -115.341,1486220267.0), Seq(34.473, 
-115.452,1486227821.0), Seq(35.572, -116.945,1486217300.0), 
Seq(37.843, -115.874,1486348520.0),Seq(35.874, -115.014,1486349803.0), 
Seq(34.345, -116,924, 1486342752.0) )

val arr2= Seq(Seq(35.573, -116.945,1486217300.0 ),Seq(34.853, 
-114.983,1486347321.0 ) )

I want to determine how many times the two arrays are within .5 miles and have the same epoch. I have two functions

def haversineDistance_single(pointA: (Double, Double), pointB: (Double, Double)): Double = {
  val deltaLat = math.toRadians(pointB._1 - pointA._1)
  val deltaLong = math.toRadians(pointB._2 - pointA._2)
  val a = math.pow(math.sin(deltaLat / 2), 2) + math.cos(math.toRadians(pointA._1)) * math.cos(math.toRadians(pointB._1)) * math.pow(math.sin(deltaLong / 2), 2)
  val greatCircleDistance = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
  3958.761 * greatCircleDistance
    }

def location_time(col_2:Seq[Seq[Double]], col_1:Seq[Seq[Double]]): Int={
  val arr=col_1.map(x=> col_2.filter(y=> (haversineDistance_single((y(0), y(1)), (x(0),x(1)))<=.5) &

    (math.abs(y(2)-x(2))<=0)).flatten).filter(x=> x.length>0)
  arr.length
}


location_time(arr1,arr2) =1

My actual collections are very large, is there a more efficient way than my location_time function to compute this.

score 2 · Accepted Answer · answered Mar 22 '19 at 18:08

I would consider revising location_time from:

def location_time(col_mobile: Seq[Seq[Double]], col_laptop: Seq[Seq[Double]]): Int = {
  val arr = col_laptop.map( x => col_mobile.filter( y =>
      (haversineDistance_single((y(0), y(1)), (x(0), x(1))) <= .5) & (math.abs(y(2) - x(2)) <= 0)
    ).flatten
  ).filter(x => x.length > 0)

  arr.length
}

to:

def location_time(col_mobile: Seq[Seq[Double]], col_laptop: Seq[Seq[Double]]): Int = {
  val arr = col_laptop.flatMap( x => col_mobile.filter( y =>
      ((math.abs(y(2) - x(2)) <= 0 && haversineDistance_single((y(0), y(1)), (x(0), x(1))) <= .5))
    )
  )

  arr.length
}

Changes made:

Revised col_mobile.filter(y => ...) from:
```
filter(_ => costlyCond1 & lessCostlyCond2)
```
to:
```
filter(_ => lessCostlyCond2 && costlyCond1)
```
Assuming haversineDistance_single is more costly to run than math.abs, replacing & with && (see difference between & versus &&) and testing math.abs first might help the filtering performance.

Simplified map/filter/flatten/filter using flatMap, replacing:

col_laptop.map(x => col_mobile.filter(y => ...).flatten).filter(_.length > 0)

with:

col_laptop.flatMap( x => col_mobile.filter( y => ... ))

In case you have access to, say, an Apache Spark cluster, consider converting your collections (if they're really large) to RDDs to compute using transformations similar to the above.

efficient computation of haversine distance between elements of collections

1 Answers1