I have Data Frame like below with three column
id|visit_class|in_date
+--+-----------+--------
|1|Non Hf |24-SEP-2017
|1|Non Hf |23-SEP-2017
|1|Hf |27-SEP-2017
|1|Non Hf |28-SEP-2017
|2|Non Hf |24-SEP-2017
|2|Hf |25-SEP-2017
I want to group this data frame on id then sort this grouped data on indate column and want only those rows which are coming after first occurrence of HF. The output will be like below. Means first 2 rows will drop for id =1 and first 1 row will drop for id = 2.
id|visit_class|in_date
+--+-----------+--------
|1|Hf |27-SEP-2017
|1|Non Hf |28-SEP-2017
|2|Hf |25-SEP-2017
How I will achieve this in Spark and Scala.