I have a dataframe like this...
+----------+-----+
| date|price|
+----------+-----+
|2019-01-01| 25|
|2019-01-02| 22|
|2019-01-03| 20|
|2019-01-04| -5|
|2019-01-05| -1|
|2019-01-06| -2|
|2019-01-07| 5|
|2019-01-08| -11|
+----------+-----+
I want to create a new column based on a logic which needs to look back on other rows - not just the column values of the same row
I was trying some UDF but it takes the corresponding row value of the column. I do not know how to look at other rows...
With example: I would like to create a new column "newprice" - which will be something like this...
+----------+-----+----------+
| date|price|new price
+----------+-----+----------+
|2019-01-01| 25| 25
|2019-01-02| 22| 22
|2019-01-03| 20| 20
|2019-01-04| -5| 20
|2019-01-05| -1| 20
|2019-01-06| -2| 20
|2019-01-07| 5| 5
|2019-01-08| -11| 5
+----------+-----+-----------+
Essentially every row in the new column value is based on not that corresponding row's values but other row's values...
Logic: If the price is negative then look back on previous days and if that day is positive value - take it or go back one more day until a positive value is available...
dateprice = [('2019-01-01',25),('2019-01-02',22),('2019-01-03',20),('2019-01-04', -5),\
('2019-01-05',-1),('2019-01-06',-2),('2019-01-07',5),('2019-01-08', -11)]
dataDF = sqlContext.createDataFrame(dateprice, ('date', 'price'))
Any help will be highly appreciated.