RDD size remains the same even after compressing

Asked Oct 18 '16 at 15:16

Active Jun 01 '17 at 20:20

Viewed 424 times

I use SparkListener to monitor the cached RDDs' sizes. However, I notice that no matter what I do, the RDDs' size always remain the same. I did the following things to compress the RDDs.

val conf = new SparkConf().setAppName("MyApp")
conf.set("spark.rdd.compress","true")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.....
val sc = new SparkContext(conf)
....
myrdd.persist(MEMORY_ONLY_SER)

Even, if I remove the second and third lines shown above, Spark listener shows the same size of the RDD, which means that setting spark.rdd.compress to true and enabling kryo serialization had no effect (OK kryo is only for serialization, but spark.rdd.compress at least could have done the trick). What mistake could I be doing?

Note that my RDD is of type (Long, String). Could that be the reason? I mean, could it be that Spark doesn't compress RDDs of this type, especially when strings are short in size?

P.S: I am using Spark 1.6

edited Jun 01 '17 at 20:20

Ram Ghadiyaram

28,239
13
95
121

asked Oct 18 '16 at 15:16

pythonic

20,589
43
136
219

Do you set the properties before creating the `SparkContext`? – Alexey Romanov Oct 18 '16 at 15:26
Yes, I set them before creating the SparkContext. – pythonic Oct 18 '16 at 15:29
were able to identify issue and resolve? – Ram Ghadiyaram Jun 01 '17 at 20:19

RDD size remains the same even after compressing

0 Answers0

Linked