1

I am new to Spark, I have the below RDD

(2, 2.0)
(2, 4.0)
(2, 1.5)
(2, 6.0)
(2, 7.0)
(2, 8.0)

I tried to convert it to (2, 28.5, 1.5, 8) where 2 is the key Value, followed by 28.5 a sum of all values, 1.5 as the minimum value and 8 being the max. Needless to say I have failed to do it.

I have followed the below approaches :

custValuesX = lines.map(parsedLines).reduceByKey(lambda x,y: (x+y,x-y,min(x,y)))

I get error :

TypeError: can only concatenate tuple (not "float") to tuple

I tried this

custValuesX = lines.map(parsedLines)
                   .mapValues(lambda x: (x,1))
                   .reduceByKey(lambda y,x: ( x[0]+y[0], min(y[0],x[0]) ))

and the result is

(2, (28.5, 8.0, 20.5))

I understand the 8.0 , as it is the last value in the pair, but whats the deal with 20.5?

Would really appreciate if some one would try to help/explain.

philantrovert
  • 9,904
  • 3
  • 37
  • 61
Ursus
  • 11
  • 1

0 Answers0