I'm using cache with Spark. Right now, I use several caches, and some of them are about 20gb in memory. I tried first with cache() and later with persist and MEMORY_SER and the size was huge so I changed to java serialization, getting about 20gb in some of them. Now, I want to use Kryo, I have register the classes and I don't get any error, but the size it's the same than when I execute it with Kryo in the most of caches.
Some of the object that I want to cache are like:
case class ObjectToCache(id: Option[Long],
listObject1: Iterable[ObjectEnriched],
mp1: Map[String, ObjectEnriched2],
mp2: Map[String, ObjectEnriched3],
mp3: Map[String, ObjectEnriched4])
I have registered in Kryo as:
kryo.register(classOf[Iterable[ObjectEnriched2]])
kryo.register(classOf[Map[String,ObjectEnriched3]])
kryo.register(classOf[Map[String,ObjectEnriched4]])
kryo.register(ObjectEnriched)
kryo.register(ObjectEnriche2)
kryo.register(ObjectEnriched3)
kryo.register(ObjectEnriched4)
Am I doing something wrong? is there any way to know if it uses Kryo? I think that it's using because in some point I got an error because I didn't have space left as:
Serialization trace:
mp1 (ObjectEnriched)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:183)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37)
at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
I'm using RDD with Spark Streaming.