I have a rdd like this
// Structure List[Tuple(x1, x2, value), Tuple(x1, x2, value)]
data = [('23', '98', 34), ('23', '89', 39), ('23', '12', 30), ('24', '12', 34), ('24', '14', 37), ('24', '16', 30)]
I am looking for the end result to be max value of score for x1 with x2 value associated with it. Like this
data = [('23', '89', 39), ('24', '14', 37)]
I tried reduceByKey
but it is giving me max of each combinations which is not what I am looking for.
From comment:
This is what I tried:
max_by_group = (
data.map(lambda x: (x[0], x))
.reduceByKey(lambda x1, x2: max(x1, x2, key=lambda x: x[-1]))
.values()
)