In pyspark, (or SQL) can I use the value calculated in the previous observation in the current observation. (rowwise calculation) (Like SAS Retain)

Question

I want to be able to consecutively go through a table using the value calculated in the previous row in the current row. It seems a window function could do this.

from pyspark.sql import SparkSession
from pyspark.sql import Window
import pyspark.sql.functions as F

# Create a sample DataFrame
data = [
    (1, 1, 3),
    (2, 2, 4),
    (3, 3, 5),
    (4, 4, 6),
    (5, 8, 10)
]

columns = ["id", "start", "stop"]

spark = SparkSession.builder.master("local").appName("SelfJoin").getOrCreate()

df = spark.createDataFrame(data, columns).sort('start')

# 1. Sort the DataFrame by 'start'
df = df.sort("start").withColumn('prev_start', col('start'))

# 2. Initialize a window that looks back one record
window = Window.orderBy(['start']).rowsBetween(-1, -1)

df = (
    df
    .withColumn("prev_start", F.lag("prev_start", 1).over(window))
)

df.toPandas().style.hide_index()

At row one

id	start	stop	prev_start
1	1	3	NaN

At row two

Extra: I need to specify that if value in previous row is missing, then use the value in the current row: 1.0

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0

At Row 3:

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0
3	3	5	2.0

Look back, prev_start assigned to previous row value of 1

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0
3	3	5	1.0

At Row 4

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0
3	3	5	1.0
4	4	6	3.0

Assign value of prev_start to what is now 1, not what it was originally: 2

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0
3	3	5	1.0
4	4	6	1.0

so now, the actually application, I also do this for prev_end = max(prev_end, end)

as soon as this prev_end is less than the current row start (and the data is ordered correctly) I now have a non-contiguous time frame (start to stop), that will be from prev_start to prev_end

Result

id	start	stop	prev_start
1	1	3	NaN
2	2	4	1.0
3	3	5	2.0
4	4	6	3.0
5	8	10	4.0

But I want