Trying to calculate the days passed since a user first started using an application and the event the df row represents. The below code (via) creates a column comparing the row to the previous row, but I need it compared to the first row of the partition.
window = Window.partitionBy('userId').orderBy('dateTime')
df = df.withColumn("daysPassed", datediff(df.dateTime,
lag(df.dateTime, 1).over(window)))
Tried "int(Window.unboundedPreceding)" in place of 1, which threw an error.
Example of what I'd like the daysPassed column to do:
Row(userId='59', page='NextSong', datetime='2018-10-01', daysPassed=0),
Row(userId='59', page='NextSong', datetime='2018-10-03', daysPassed=2),
Row(userId='59', page='NextSong', datetime='2018-10-04', daysPassed=3)