In Apache Spark 2.0+ how do I find the maximum of minimums, in the following problem:
df1
+---+---+
| id| ts|
+---+---+
| 1| 20|
| 2| 15|
+---+---+
df2
+---+---+
| id| ts|
+---+---+
| 1| 10|
| 1| 25|
| 1| 36|
| 2| 25|
| 2| 35|
+---+---+
the desired dataframe is:
+---+---+
| id| ts|
+---+---+
| 1| 10|
| 2| 15|
+---+---+
Problem in words: For every id
in df1
pick the maximum ts
value that is lesser than the ts
value in df1
, if no such value exists, just print the ts
value in df1
.