Using Scala Pickling serialization In APACHE SPARK over KryoSerializer and JavaSerializer

Question

While searching best Serialization techniques for apache-spark I found below link https://github.com/scala/pickling#scalapickling which states Serialization in scala will be more faster and automatic with this framework.

And as Scala Pickling has following advantages. (Ref - https://github.com/scala/pickling#what-makes-it-different)

So, I wanted to know whether this Scala Pickling (PickleSerializer) can be used in apache-spark instead of KryoSerializer.

If yes what are the necessary changes is to be done. (Example would be helpful)
If No why not. (Please explain)

Thanks in advance. And forgive me if I am wrong.

Note : I am using scala language to code apache-spark (Version. 1.4.1) application.

Please consider [accepting the answer @Sam](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) — zero323, Mar 21 '17 at 12:57

score 3 · Accepted Answer · answered Mar 21 '17 at 10:58

I visited Databricks for a couple of months in 2014 to try and incorporate a PicklingSerializer into Spark somehow, but couldn't find a way to include type information needed by scala/pickling into Spark without changing interfaces in Spark. At the time, it was a no-go to change interfaces in Spark. E.g., RDDs would need to include Pickler[T] type information into its interface in order for the generation mechanism in scala/pickling to kick in.

All of that changed though with Spark 2.0.0. If you use Datasets or DataFrames, you get so-called Encoders. This is even more specialized than scala/pickling.

Use Datasets in Spark 2.x. It's much more performant on the serialization front than plain RDDs

Thanks for reply @Heather Miller But Currently I am using apache-spark 1.4.1 and Kryo serialization. So I think I have to continue with the same only :( — Sam, Mar 21 '17 at 11:18

Using Scala Pickling serialization In APACHE SPARK over KryoSerializer and JavaSerializer

1 Answers1