37

Is there a simple, hassle-free approach to serialization in Scala/Java that's similar to Python's pickle? Pickle is a dead-simple solution that's reasonably efficient in space and time (i.e. not abysmal) but doesn't care about cross-language accessibility, versioning, etc. and allows for optional customization.

What I'm aware of:

Kryo and protostuff are the closest solutions I've found, but I'm wondering if there's anything else out there (or if there's some way to use these that I should be aware of). Please include usage examples! Ideally also include benchmarks.

Community
  • 1
  • 1
Yang
  • 16,037
  • 15
  • 100
  • 142
  • 8
    I think it's a bit of an unfair comparison, because Python is naturally much slower than Java. A "fast" serializer in Python is probably significantly slower than a "slow" serializer in Java. – NullUserException Sep 28 '11 at 22:50
  • 1
    @NullUserExceptionఠ_ఠ You're right in that it would be nice to have some way to compare Python pickle vs Java serialization. That said, Pickle (cPickle in Python 2.x) is in C, not Python. – Yang Sep 29 '11 at 00:46
  • From my experience I believe Java serialization is much slower than pickle for comparable tasks (always tricky to compare things across languages). I know for certain it's more bloated than pickle for comparable tasks. Perhaps someone can provide numbers (or maybe I'll eventually find the time to do that)? Also, I think an equally important point between Pickle and Java serialization is that you don't have to rely on everything being marked Serializable. – Yang Sep 29 '11 at 19:12
  • 2
    For kryo there is also the addon project https://github.com/magro/kryo-serializers that allows to (de-)serialize objects with no 0-arg constructors if you're using a sun/oracle jvm. – MartinGrotzke Oct 03 '11 at 20:13
  • 3
    In Kryo 2.x, use `kryo.setInstantiatorStrategy(new StdInstantiatorStrategy())` to get reflection-based constructor instantiation, without any 0-arg constructors. – Raman Oct 31 '12 at 06:47

5 Answers5

11

I actually think you'd be best off with kryo (I'm not aware of alternatives that offer less schema defining other than non-binary protocols). You mention that pickle is not susceptible to the slowdowns and bloat that kryo gets without registering classes, but kryo is still faster and less bloated than pickle even without registering classes. See the following micro-benchmark (obviously take it with a grain of salt, but this is what I could do easily):

Python pickle

import pickle
import time
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
people = [Person("Alex", 20), Person("Barbara", 25), Person("Charles", 30), Person("David", 35), Person("Emily", 40)]
for i in xrange(10000):
    output = pickle.dumps(people, -1)
    if i == 0: print len(output)
start_time = time.time()
for i in xrange(10000):
    output = pickle.dumps(people, -1)
print time.time() - start_time    

Outputs 174 bytes and 1.18-1.23 seconds for me (Python 2.7.1 on 64-bit Linux)

Scala kryo

import com.esotericsoftware.kryo._
import java.io._
class Person(val name: String, val age: Int)
object MyApp extends App {
  val people = Array(new Person("Alex", 20), new Person("Barbara", 25), new Person("Charles", 30), new Person("David", 35), new Person("Emily", 40))
  val kryo = new Kryo
  kryo.setRegistrationOptional(true)
  val buffer = new ObjectBuffer(kryo)
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
    if (i == 0) println(output.size)
  }
  val startTime = System.nanoTime
  for (i <- 0 until 10000) {
    val output = new ByteArrayOutputStream
    buffer.writeObject(output, people)
  }
  println((System.nanoTime - startTime) / 1e9)
}

Outputs 68 bytes for me and 30-40ms (Kryo 1.04, Scala 2.9.1, Java 1.6.0.26 hotspot JVM on 64-bit Linux). For comparison, it outputs 51 bytes and 18-25ms if I register the classes.

Comparison

Kryo uses about 40% of the space and 3% of the time as Python pickle when not registering classes, and about 30% of the space and 2% of the time when registering classes. And you can always write a custom serializer when you want more control.

Mike
  • 1,839
  • 1
  • 17
  • 25
  • 1
    Thanks for the numbers. I can't seem to deserialize the objects, though, which is a severe limitation (updated my question with this). I get "com.esotericsoftware.kryo.SerializationException: Unable to deserialize object of type: Person" caused by "com.esotericsoftware.kryo.SerializationException: Class cannot be created (missing no-arg constructor): Person". – Yang Oct 03 '11 at 19:59
  • 5
    If you're using a sun/oracle jvm you can use https://github.com/magro/kryo-serializers for deserializing objects without a 0-arg constructor: just change "new Kryo" to "new KryoReflectionFactorySupport". – MartinGrotzke Oct 03 '11 at 20:17
  • @MartinGrotzke: You have made me a very happy person today. Impressive work. I see myself pimping around your library already. I really hope it makes its into kryo itself. – Yang Oct 11 '11 at 05:12
  • @Yang Cool! Looking forward to your pull request :-) – MartinGrotzke Oct 11 '11 at 11:55
  • 7
    In Kryo 2.x, use kryo.setInstantiatorStrategy(new StdInstantiatorStrategy()) to get reflection-based constructor instantiation, without any 0-arg constructors. – Raman Oct 31 '12 at 06:48
  • I just wanted to note that StdInstantiatorStrategy is not part of kryo, it is par of objenesis http://objenesis.googlecode.com/svn/docs/index.html – mikkom Apr 19 '13 at 08:45
9

Edit 2020-02-19: please note, as mentioned by @federico below, this answer is no longer valid as the repository has been archived by the owner.

Scala now has Scala-pickling which performs as good or better than Kyro depending on scenario - See slides 34-39 in this presentation.

Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68
  • I think it's not worth creating full blown question, so I ask here: is it currently possible to use this library on Android? If yes, then what toolchain I should use? Is it possible within Eclipse + ADT + https://github.com/banshee/AndroidProguardScala or I will need something more clever (sbt, maven, ...)? – Display Name Nov 06 '13 at 20:20
  • I haven't tried using it on Android so I can't tell. You can try and post questions on stackoverflow if you find any problems :) – Arnon Rotem-Gal-Oz Nov 07 '13 at 07:56
  • "This repository has been archived by the owner. It is now read-only." – Federico Feb 19 '20 at 14:42
7

Twitter's chill library is just awesome. It uses Kryo for serialization but is ultra simple to use. Also nice: provides a MeatLocker[X] type which makes any X a Serializable.

ib84
  • 675
  • 5
  • 16
4

I would recommend SBinary. It uses implicits which are resolved at compile time, so it's very effective and typesafe. It comes with built-in support for many common Scala datatypes. You have to manually write the serialization code for your (case) classes, but it's easy to do.

A usage example for a simple ADT

Jesper Nordenberg
  • 2,104
  • 11
  • 15
  • What's going on in line 18 to 40 here? https://github.com/harrah/sbinary/blob/master/core/src/standardtypes.scala – Knut Arne Vedaa Sep 29 '11 at 14:40
  • 1
    I've seen SBinary. The point is that you have to write your own serialization code. This is probably the most verbose of all the options I've listed, which is why I decided it didn't make the cut into my list. – Yang Sep 29 '11 at 19:20
  • @KnutArneVedaa Authors use a preprocessor (FMPP) to generate tuple formats. See the sbt project definition. – paradigmatic Sep 29 '11 at 19:48
  • 1
    The guy says that declaring an empty constructor is a big deal for him and you suggest a custom serializer as a solution? – Val Sep 04 '12 at 22:00
0

Another good option is the recent (2016) **netvl/picopickle**:

  • Small and almost dependency-less (the core library depends only on shapeless).
  • Extensibility: you can define your own serializators for your types and you can create custom backends, that is, you can use the same library for the different serialization formats (collections, JSON, BSON, etc.); other parts of the serialization behavior like nulls handling can also be customized.
  • Flexibility and convenience: the default serialization format is fine for most uses, but it can be customized almost arbitrarily with support from a convenient converters DSL.
  • Static serialization without reflection: shapeless Generic macros are used to provide serializers for arbitrary types, which means that no reflection is used.

For example:

Jawn-based pickler also provides additional functions, readString()/writeString() and readAst()/writeAst(), which [de]serialize objects to strings and JSON AST to strings, respectively:

import io.github.netvl.picopickle.backends.jawn.JsonPickler._

case class A(x: Int, y: String)

writeString(A(10, "hi")) shouldEqual """{"x":10,"y":"hi"}"""
readString[A]("""{"x":10,"y":"hi"}""") shouldEqual A(10, "hi")
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250