3

I use SparkListener to monitor the cached RDDs' sizes. However, I notice that no matter what I do, the RDDs' size always remain the same. I did the following things to compress the RDDs.

val conf = new SparkConf().setAppName("MyApp")
conf.set("spark.rdd.compress","true")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.....
val sc = new SparkContext(conf)
....
myrdd.persist(MEMORY_ONLY_SER)

Even, if I remove the second and third lines shown above, Spark listener shows the same size of the RDD, which means that setting spark.rdd.compress to true and enabling kryo serialization had no effect (OK kryo is only for serialization, but spark.rdd.compress at least could have done the trick). What mistake could I be doing?

Note that my RDD is of type (Long, String). Could that be the reason? I mean, could it be that Spark doesn't compress RDDs of this type, especially when strings are short in size?

P.S: I am using Spark 1.6

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
pythonic
  • 20,589
  • 43
  • 136
  • 219

0 Answers0