big integer number in Spark

Question

In Spark-shell, I run the following code:

scala> val input = sc.parallelize(List(1, 2, 4, 1881824400))
input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:21

scala> val result = input.map(x => 2*x)
result: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:23

scala> println(result.collect().mkString(","))
2,4,8,-531318496

Why the result of 2*1881824400 = -531318496 ? not 3763648800 ?

Is that a bug in Spark?

Thanks for your help.

I believe it' s an overflow http://stackoverflow.com/questions/3001836/how-does-java-handle-integer-underflows-and-overflows-and-how-would-you-check-fo — ccheneson, Jul 31 '15 at 13:46
2*1881824400 is greater than the max integer which is 2^31 - 1. You should use BigInteger instead of Int to be able to get the value you are expecting. — hveiga, Jul 31 '15 at 13:48

score 0 · Answer 1 · answered Jul 31 '15 at 14:40

0

Thanks ccheneson and hveiga. The answer is that the mapping makes the result bigger than 2^31, run out the range of Interger. Therefore, the number jumps into the negatives region.

answered Jul 31 '15 at 14:40

Do Do

623
5
14

big integer number in Spark

1 Answers1