2

I am applying a method and it is giving error because the cast is not well done

How could I 1) cast all fields in a more efective way, 2) use withColumn just one time and then 3) run the method with numbers (not string):

q5 = q4.withColumn("DISTANCE", q4["LOCLAT"].cast(IntegerType()))
q6 = q4.withColumn("DISTANCE", q4["LOCLONG"].cast(IntegerType()))
q7 = q4.withColumn("DISTANCE", q4["LOCLAT2"].cast(IntegerType()))
q8 = q4.withColumn("DISTANCE", q4["LOCLONG2"].cast(IntegerType()))


q9 = (q4.withColumn('distance', haversine('LOCLONG', 'LOCLAT', 'LOCLONG2', 'LOCLAT2')))

Thanks!!

mck
  • 40,932
  • 13
  • 35
  • 50
Ana
  • 103
  • 1
  • 8
  • you're overwriting the column 'distance' in each line, and assigining the resulting dataframe to an unused variable. are you sure that's what you want to do? – mck Dec 12 '20 at 06:38

1 Answers1

2

I'm not sure what you want to achieve, but here's how to convert all the 4 columns to integer type and calling the haversine function:

df = q4.select(
    '*',
    *[F.col(c).cast('int').alias(c + '_int')
      for c in ['LOCLONG', 'LOCLAT', 'LOCLONG2', 'LOCLAT2']]
)

df = df.withColumn(
    'distance',
    haversine('LOCLONG_int', 'LOCLAT_int', 'LOCLONG2_int', 'LOCLAT2_int')
)
mck
  • 40,932
  • 13
  • 35
  • 50
  • Hi, thanks much!! Code looks good and runs well but for some reason when applying the method it is giving me this error: must be real number, not str Traceback (most recent call last): File "", line 11, in haversine TypeError: must be real number, not str – Ana Dec 13 '20 at 14:23
  • I took the method from here: https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points – Ana Dec 13 '20 at 14:24
  • `codefrom math import radians, cos, sin, asin, sqrt def haversine(lon1, lat1, lon2, lat2): """ Calculate the great circle distance between two points on the earth (specified in decimal degrees) """ # convert decimal degrees to radians lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * asin(sqrt(a)) r = 6371 # Radius of earth in kilometers. Use 3956 for miles return c * r` – Ana Dec 13 '20 at 14:27
  • @Ana did you convert the function to a UDF? put @F.udf above the line `def haversine(...)` – mck Dec 13 '20 at 14:28