0

I have pyspark Dataframe as follows,

enter image description here

I need to add EOM column to all the null values for each id dynamically based on last non null EOM value and it should be continuous.

My output dataframe looks like this,

enter image description here

I have tried this logic

df.where("EOM IS not NULL").groupBy(df['id']).agg(add_months(first(df['EOM']),1))

but the expected format is different

code_bug
  • 355
  • 1
  • 12

1 Answers1

0
from pyspark.sql.functions import expr

df = spark.createDataFrame(
    [("2015-06-23", 5), ("2016-07-20", 7)],
    ("data_date", "months_to_add")
).select(to_date("data_date").alias("data_date"), "months_to_add")

df.withColumn("new_data_date", expr("add_months(data_date, months_to_add)")).show()
samkart
  • 6,007
  • 2
  • 14
  • 29