I am new to Spark, I have the below RDD
(2, 2.0)
(2, 4.0)
(2, 1.5)
(2, 6.0)
(2, 7.0)
(2, 8.0)
I tried to convert it to (2, 28.5, 1.5, 8)
where 2
is the key Value, followed by 28.5
a sum of all values, 1.5
as the minimum value and 8
being the max. Needless to say I have failed to do it.
I have followed the below approaches :
custValuesX = lines.map(parsedLines).reduceByKey(lambda x,y: (x+y,x-y,min(x,y)))
I get error :
TypeError: can only concatenate tuple (not "float") to tuple
I tried this
custValuesX = lines.map(parsedLines)
.mapValues(lambda x: (x,1))
.reduceByKey(lambda y,x: ( x[0]+y[0], min(y[0],x[0]) ))
and the result is
(2, (28.5, 8.0, 20.5))
I understand the 8.0
, as it is the last value in the pair, but whats the deal with 20.5
?
Would really appreciate if some one would try to help/explain.