Is there a way to replace null
values in pyspark dataframe with the last valid value? There is addtional timestamp
and session
columns if you think you need them for windows partitioning and ordering. More specifically, I'd like to achieve the following conversion:
+---------+-----------+-----------+ +---------+-----------+-----------+
| session | timestamp | id| | session | timestamp | id|
+---------+-----------+-----------+ +---------+-----------+-----------+
| 1| 1| null| | 1| 1| null|
| 1| 2| 109| | 1| 2| 109|
| 1| 3| null| | 1| 3| 109|
| 1| 4| null| | 1| 4| 109|
| 1| 5| 109| => | 1| 5| 109|
| 1| 6| null| | 1| 6| 109|
| 1| 7| 110| | 1| 7| 110|
| 1| 8| null| | 1| 8| 110|
| 1| 9| null| | 1| 9| 110|
| 1| 10| null| | 1| 10| 110|
+---------+-----------+-----------+ +---------+-----------+-----------+