In pyspark say suppose we have three column Start_date
, duration
, End_date
.
How can i look at the first rows end_date
and second row Start_date
. if second row start_date
is greater than first row end date do nothing otherwise if first rows End_date
is less than Second row Start_date
then replace the second row start_date
with first row end_date
and add duration of second row to start_date
and replace end_date
of second row second row with new value. and do it for complete one group of ID.
Asked
Active
Viewed 488 times
0

Milad Bahmanabadi
- 946
- 11
- 27

pallav kumar
- 11
- 3
-
1it would help others answer your question if you could provide a reproducible example for your dataframe and required output. – murtihash Apr 25 '20 at 15:48
-
@MohammadMurtazaHashmi - True but since i am new to Stack attaching Image is not allowed for me as of now. I tried attaching image now see if you can see it in my post. – pallav kumar Apr 25 '20 at 16:04
-
1[Please do not post images of code/data as they cant be copied](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) , , it would help if you create a reproducible example , Take a look at [How to make good reproducible Apache Spark examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples) – anky Apr 25 '20 at 16:35
1 Answers
0
Use window lag/lead
functions partitionBy id
, orderBy start_date
to compare first rows end_Date
with second row start_date
.
- Use
when otherwise
statement withdatediff
function to caluculate difference of dates forduration
column.

notNull
- 30,258
- 4
- 35
- 50
-
can i look at both rows at once. using lag function i know i can define a new column but here i want to update the end date sequentially.. so in one statement can do this operation . can i write a function somethig like . when (lag 1 ,window) end_date < start_date , start_date = Lag1 ,window End_date , end date = start_date which is updated + duration , Else do nothing. – pallav kumar Apr 25 '20 at 16:24