0

In Spark-shell, I run the following code:

scala> val input = sc.parallelize(List(1, 2, 4, 1881824400))
input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:21

scala> val result = input.map(x => 2*x)
result: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:23

scala> println(result.collect().mkString(","))
2,4,8,-531318496

Why the result of 2*1881824400 = -531318496 ? not 3763648800 ?

Is that a bug in Spark?

Thanks for your help.

Do Do
  • 623
  • 5
  • 14
  • I believe it' s an overflow http://stackoverflow.com/questions/3001836/how-does-java-handle-integer-underflows-and-overflows-and-how-would-you-check-fo – ccheneson Jul 31 '15 at 13:46
  • 1
    2*1881824400 is greater than the max integer which is 2^31 - 1. You should use BigInteger instead of Int to be able to get the value you are expecting. – hveiga Jul 31 '15 at 13:48

1 Answers1

0

Thanks ccheneson and hveiga. The answer is that the mapping makes the result bigger than 2^31, run out the range of Interger. Therefore, the number jumps into the negatives region.

Do Do
  • 623
  • 5
  • 14