I need to compare two tables (i.e data frames) in spark data frame, row by row, and get each row with the smaller value for a specific column. For example:
Let's say I want to get each row with the lower scored subject for each student, so therefore I want this result:
I was thinking of joining both data frame first with id as join attribute, but my original tables are large and have more attributes. It seems that this is doable without join. The closest problem I can found is this but I don't know get how to apply that to my case.
Btw solutions with join are also appreciable, I am just thinking if there can be a better solution.