0

i have a dataframe which is having date column. column has duplicate value.

i want another column which should show next heigher value of the date

Example :

co1  col2  col3
1     abc  1982-07-01
2     abc  1992-07-02
3     abc  1992-07-02
4     abc  1992-07-02
5     abc  2000-07-02
6     abc  2001-07-02

Expected result : next higher value

co1  col2  col3        col4
1     abc  1982-07-01  1992-07-02
2     abc  1992-07-02  2000-07-02
3     abc  1992-07-02  2000-07-02
4     abc  1992-07-02  2000-07-02
5     abc  2000-07-02  2001-07-02
6     abc  2001-07-02  Null

Please help me resolving this issue..

anky
  • 74,114
  • 11
  • 41
  • 70
  • 1
    you can use a window `w = Window().partitionBy().orderBy(col("co1"))` and then `lead` : `df.select("*", lead("col3").over(w).alias("col4")).show()` also make sure the dates are not string representation of dates , if they are strings first change it `to_date` : `df = df.withColumn("col3",F.to_date("col3"))` – anky Apr 17 '20 at 13:33
  • Does this answer your question? [Spark add new column to dataframe with value from previous row](https://stackoverflow.com/questions/34295642/spark-add-new-column-to-dataframe-with-value-from-previous-row) – anky Apr 17 '20 at 13:33

0 Answers0