2

While searching best Serialization techniques for apache-spark I found below link https://github.com/scala/pickling#scalapickling which states Serialization in scala will be more faster and automatic with this framework.

And as Scala Pickling has following advantages. (Ref - https://github.com/scala/pickling#what-makes-it-different)

So, I wanted to know whether this Scala Pickling (PickleSerializer) can be used in apache-spark instead of KryoSerializer.

  • If yes what are the necessary changes is to be done. (Example would be helpful)
  • If No why not. (Please explain)

Thanks in advance. And forgive me if I am wrong.

Note : I am using scala language to code apache-spark (Version. 1.4.1) application.

zero323
  • 322,348
  • 103
  • 959
  • 935
Sam
  • 139
  • 1
  • 10
  • Please consider [accepting the answer @Sam](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) – zero323 Mar 21 '17 at 12:57

1 Answers1

3

I visited Databricks for a couple of months in 2014 to try and incorporate a PicklingSerializer into Spark somehow, but couldn't find a way to include type information needed by scala/pickling into Spark without changing interfaces in Spark. At the time, it was a no-go to change interfaces in Spark. E.g., RDDs would need to include Pickler[T] type information into its interface in order for the generation mechanism in scala/pickling to kick in.

All of that changed though with Spark 2.0.0. If you use Datasets or DataFrames, you get so-called Encoders. This is even more specialized than scala/pickling.

Use Datasets in Spark 2.x. It's much more performant on the serialization front than plain RDDs

Heather Miller
  • 3,901
  • 1
  • 23
  • 19
  • Thanks for reply @Heather Miller But Currently I am using apache-spark 1.4.1 and Kryo serialization. So I think I have to continue with the same only :( – Sam Mar 21 '17 at 11:18