-3

my spark RDD looks something like this

totalDistance=flightsParsed.map(lambda x:x.distance)
totalDistance.take(5)


[1979.0, 640.0, 1947.0, 1590.0, 874.0]

But when i run reduce on it I get error as mentioned below

totalDistance=flightsParsed.map(lambda x:x.distance).reduce(lambda y,z:y+z)

ValueError: could not convert string to float:

Please help.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 2
    [There](https://stackoverflow.com/questions/44950532/pyspark-valueerror-could-not-convert-string-to-float-invalid-literal-for-fl) [are](https://stackoverflow.com/questions/32098641/valueerror-could-not-convert-string-to-float) [several](https://stackoverflow.com/questions/36113328/python-pyspark-error-valueerror-could-not-convert-string-to-float-17) similar questions. I would suggest reading these first, and looking at ensuring your data comes in, or is cast as a floating point prior to arithmetic. – Zooby Nov 30 '17 at 04:28

1 Answers1

0

Did you try:

totalDistance=flightsParsed.map(lambda x: int(x.distance or 0))

or

totalDistance=flightsParsed.map(lambda x: float(x.distance or 0))

You may have missing or inconsistent data inside flightsParsed

user3689574
  • 1,596
  • 1
  • 11
  • 20