Assume this is my data:
date value
2016-01-01 1
2016-01-02 NULL
2016-01-03 NULL
2016-01-04 2
2016-01-05 3
2016-01-06 NULL
2016-01-07 NULL
2016-01-08 NULL
2016-01-09 1
I am trying to find the start and end dates that surround the NULL-value groups. An example output would be as follows:
start end
2016-01-01 2016-01-04
2016-01-05 2016-01-09
My first attempt at the problem produced the following:
df.filter($"value".isNull)\
.agg(to_date(date_add(max("date"), 1)) as "max",
to_date(date_sub(min("date"),1)) as "min"
)
but this only finds the total min and max value. I thought of using groupBy but don't know how to create a column for each of the null value blocks.