0

How can I compute the distance below spark dataframe between Location A and B and Location A and Location C?

  spark = SparkSession(sc)
  df = spark.createDataFrame([('A', 
  40.202750,29.168350,'B',40.689247,-74.044502),('A', 
  40.202750,29.168350,'C',25.197197,55.274376)], ['Location1', 'Lat1', 
  'Long1', 'Location2', 'Lat2', 'Lon2'])

So the dataset below:

    +---------+--------+--------+---------+---------+----------+
    |Location1|    Lat1|   Long1|Location2|     Lat2|      Lon2|
    +---------+--------+--------+---------+---------+----------+
    |        A|40.20275|29.16835|        B|40.689247|-74.044502|
    |        A|40.20275|29.16835|        C|25.197197| 55.274376|
    +---------+--------+--------+---------+---------+----------+

Thank you

melik
  • 1,268
  • 3
  • 21
  • 42
  • 1
    You could use something like `val df1 = df.withColumn("distance", distance(df("Lat1"), df("Long1"), df("Lat2"), df("Long2")))` - you will need to write the function to work out the distance (eg https://stackoverflow.com/questions/837872/calculate-distance-in-meters-when-you-know-longitude-and-latitude-in-java) – PJ Fanning Jul 29 '19 at 14:36
  • 1
    Maybe something like https://datasystemslab.github.io/GeoSpark/ could help – PJ Fanning Jul 29 '19 at 14:38

1 Answers1

0

You could use the Haversine formula which goes something like

2*6378*asin(sqrt(pow(sin((lat2-lat1)/2),2) + cos(lat1)*cos(lat2)*pow(sin((lon2-lon1)/2),2)))

Furthermore, you can create a UDF for the same which would be

import pyspark.sql.functions as F

def haversine(lat1, lon1, lat2, lon2):
  return 2*6378*sqrt(pow(sin((lat2-lat1)/2),2) + cos(lat1)*cos(lat2)*pow(sin((lon2-lon1)/2),2))

dist_udf=F.udf(haversine, FloatType())

Make sure your latitude and longitude values are in radians.

You could add that conversion as part of the haversine function before the calculation part

Once you have the UDF, you can straightaway do

df.withColumn('distance', dist_udf(F.col('lat1'), F.col('long1'), F.col('lat2'), F.col('long2')))
fuzzy-memory
  • 157
  • 8