For simplicity let's assume that I have the following daraframe:
col X col Y col Z
A 1 5
A 2 10
A 3 10
B 5 15
I want to Groupby column X and aggregate by taking min value of Z however I want the Y value to be the adjcent value of min value Z
df.groupBy("X").agg(min("Z"), take_y_according_to_min_z("Y")
Desired output:
col X col Y col Z
A 1 5
B 5 15
Note: If there are more than two min("Z")
values I don't care which of the rows we take.
I tried to find something online which is clean and SPARKy. It's really clear to me how I can do it in MapReduce but I can't find a way on SPARK.
I'm working on SPARK 1.6