0
from pyspark.sql.window import Window
import mpu
from pyspark.sql.functions import udf
from pyspark.sql.functions import lag

    from math import sin, cos, sqrt, atan2
    windowSpec  = Window.partitionBy("UserID").orderBy(asc("Timestamp"))
    df14=df.withColumn("newLatitude",lag("Latitude",1).over(windowSpec)) \
          .withColumn("newLongitude",lag("Longitude",1).over(windowSpec)) \
          .drop('AllZero'," Date","Time","Altitude") 
    df15=df14.orderBy(col("UserID").asc(),col("Timestamp").asc())
    df16=df15.na.drop()
    from geopy.distance import geodesic
    origin = (30.172705, 31.526725)  # (latitude, longitude) don't confuse
    dist = (30.288281, 31.732326)
    print(geodesic(origin, dist).meters)
    df17=df16.withColumn("distance",geodesic((col("Latitude"), col("Longitude")), (col("newLatitude"), col("newLongitude"))).meters)
    df17.show()

i try to use lag function to get put the previous set of Latitude and Longitude after the original df, but when i try to caculate the distance between these two sets of Latitude and Longitude, it went worong like:

/usr/local/spark/python/pyspark/sql/column.py in nonzero(self) 688 689 def nonzero(self): --> 690 raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', " 691 "'~' for 'not' when building DataFrame boolean expressions.") 692 bool = nonzero

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. I really don't understand what was going on.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
CN Z
  • 1
  • 1
  • 2
  • Does this answer your question? [ValueError: Cannot convert column into bool](https://stackoverflow.com/questions/48282321/valueerror-cannot-convert-column-into-bool) – blackbishop Jan 09 '22 at 10:07
  • Sorry,i saw that page and the answer there can not slove my problem, but i some how managed to solve it with a different answer, which i will post below – CN Z Jan 23 '22 at 22:16

1 Answers1

0
def dist_col(a, b, c, d):
    col_dist = geodesic((a,b), (c,d)).meters
    return col_dist
# integer datatype is defined
new_f = F.udf(dist_col, FloatType())

df17=df16.withColumn('dist', new_f(col("Latitude"), col("Longitude"),col("newLatitude"), col("newLongitude")))

i creat a funtion to calculate outside the withColumn function,and use udf to define parameter types.

CN Z
  • 1
  • 1
  • 2
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 23 '22 at 22:23