52

I'm currently working on a project which needs to persist any kind of object (of which implementation we don't have any control) so these objects could be recovered afterwards.

We can't implement an ORM because we can't restrict the users of our library at development time.

Our first alternative was to serialize it with the Java default serialization but we had a lot of trouble recovering the objects when the users started to pass different versions of the same object (attributes changed types, names, ...).

We have tried with the XMLEncoder class (transforms an object into a XML), but we have found that there is a lack of functionality (doesn't support Enums for example).

Finally, we also tried JAXB but this impose our users to annotate their classes.

Any good alternative?

Jim Ferrans
  • 30,582
  • 12
  • 56
  • 83
Alotor
  • 7,407
  • 12
  • 38
  • 36

13 Answers13

47

It's 2011, and in a commercial grade REST web services project we use the following serializers to offer clients a variety of media types:

  • XStream (for XML but not for JSON)
  • Jackson (for JSON)
  • Kryo (a fast, compact binary serialization format)
  • Smile (a binary format that comes with Jackson 1.6 and later).
  • Java Object Serialization.

We experimented with other serializers recently:

  • SimpleXML seems solid, runs at 2x the speed of XStream, but requires a bit too much configuration for our situation.
  • YamlBeans had a couple of bugs.
  • SnakeYAML had a minor bug relating to dates.

Jackson JSON, Kryo, and Jackson Smile were all significantly faster than good old Java Object Serialization, by about 3x to 4.5x. XStream is on the slow side. But these are all solid choices at this point. We'll keep monitoring the other three.

Matt
  • 17,290
  • 7
  • 57
  • 71
Jim Ferrans
  • 30,582
  • 12
  • 56
  • 83
  • 2
    Very good set of serializers -- about the only other one I might mention is protostuff. Also: Jackson now has XML module (https://github.com/FasterXML/jackson-dataformat-xml), which works reasonably well to be potential alternative to XStream; and quite a bit faster where it works well. And there is even YAML plug-in, if YAML output is needed (https://github.com/FasterXML/jackson-dataformat-yaml); these all use the same data binding, just different low-level parsers/generators. – StaxMan May 04 '12 at 05:41
  • I'd go with SimpleXML. It is highly customizable and gives you everything you need with simple and logic annotations. – Bitcoin Cash - ADA enthusiast Jul 10 '14 at 22:39
  • 1
    I've worked with the author of Jackson for many years and he does a phenomenal job (fast response, solid codebase). I would vouch for any project he works on (so I recommend you give Smile a try). – Gili Nov 22 '14 at 19:48
  • Kyro absolutely rocks! I prefer a binary format any day (don't know why!) :D – ankush981 Nov 18 '16 at 15:54
  • 1
    I would add AVRO and Protobuf – Fernando Miguélez Jul 20 '22 at 07:12
20

http://x-stream.github.io/ is nice, please take a look at it! Very convenient

facundofarias
  • 2,973
  • 28
  • 27
krosenvold
  • 75,535
  • 32
  • 152
  • 208
16

of which implementation we don't have any control

The solution is don't do this. If you don't have control of a type's implementation you shouldn't be serialising it. End of story. Java serialisation provides serialVersionUID specifically for managing serialisation incompatibilities between different versions of a type. If you don't control the implementation you cannot be sure that IDs are being changed correctly when a developer changes a class.

Take a simple example of a 'Point'. It can be represented by either a cartesian or a polar coordinate system. It would be cost prohibitive for you to build a system that could cope dynamically with these sorts of corrections - it really has to be the developer of the class who designs the serialisation.

In short it's your design that's wrong - not the technology.

johnstok
  • 96,212
  • 12
  • 54
  • 76
  • I don't dought that this isn't the perfect solution but anyway it's a requirement imposed to me externaly. We are developing a library and there is a necesity to register the calls through this library. – Alotor Oct 27 '08 at 14:27
  • disagree. with fst-serialization you can optionally inject the serialization code even for classes you don't control (serializer class). also explicit versioning is not a problem, its just that the JDK method isn't that well thought .. – R.Moeller Jan 08 '15 at 13:12
13

The easiest thing for you to do is still to use serialization, IMO, but put more thought into the serialized form of the classes (which you really ought to do anyway). For instance:

  1. Explicitly define the SerialUID.
  2. Define your own serialized form where appropriate.

The serialized form is part of the class' API and careful thought should be put into its design.

I won't go into a lot of details, since pretty much everything I have said comes from Effective Java. I'll instead, refer you to it, specifically the chapters about Serialization. It warns you about all the problems you're running into, and provides proper solutions to the problem:

http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683


With that said, if you're still considering a non-serialization approach, here are a couple:

XML marshalling

As many has pointed out is an option, but I think you'll still run into the same problems with backward compatibility. However, with XML marshalling, you'll hopefully catch these right away, since some frameworks may do some checks for you during initialization.

Conversion to/from YAML

This is an idea I have been toying with, but I really liked the YAML format (at least as a custom toString() format). But really, the only difference for you is that you'd be marshalling to YAML instead of XML. The only benefit is that that YAML is slightly more human readable than XML. The same restrictions apply.

Jim Ferrans
  • 30,582
  • 12
  • 56
  • 83
Jack Leow
  • 21,945
  • 4
  • 50
  • 55
11

google came up with a binary protocol -- http://code.google.com/apis/protocolbuffers/ is faster, has a smaller payload compared to XML -- which others have suggested as alternate.

One of the advanteages of protocol buffers is that it can exchange info with C, C++, python and java.

anjanb
  • 12,999
  • 18
  • 77
  • 106
  • Unfortunately, you can not really use it for arbitrary POJOs because you (a) need schemas for every type and (b) must code-generate objects to use. – StaxMan May 04 '12 at 05:42
  • And besides that, Protocol Buffers aren't designed for possibly circular references or for object sharing (although this could be hacked in, just as the JSync concept (http://jsync.org/) adds object references to JSON) – Qwertie Aug 16 '13 at 22:05
6

Also a very fast JDK serialization drop-in replacement: http://ruedigermoeller.github.io/fast-serialization/

Jim Ferrans
  • 30,582
  • 12
  • 56
  • 83
R.Moeller
  • 3,436
  • 1
  • 17
  • 12
  • Not support any of new Java version, nothing above 8 – newOne Jan 29 '20 at 09:48
  • We are in 2023 and FST (in its v.2.x and 3.x flavours) has support for JDK17 and even JDK20. And, in my opinion, this is the best of both worlds: the same configuration (pretty simply done) can be used for binary and JSON outputs. And both are much faster (and have a smaller footprint) than the stock Java serialisation and, for instance, Gson. It also scales up pretty well. – Alexis Drogoul Aug 02 '23 at 04:06
5

Try serializing to json with Gson for example.

Benno Richters
  • 15,378
  • 14
  • 42
  • 45
4

If serialization speed is important to you then there is a comprehensive benchmark of JVM serializers here:

Andrejs
  • 26,885
  • 12
  • 107
  • 96
1

Possibly Castor?

toolkit
  • 49,809
  • 17
  • 109
  • 135
1

Personally, I use Fame a lot, since it features interoperability with Smalltalk (both VW and Squeak) and Python. (Disclaimer, I am the main contributor of the Fame project.)

akuhn
  • 27,477
  • 2
  • 76
  • 91
1

Betwixt is a good library for serializing objects - but it's not going to be an automatic kind of thing. If the number of objects you have to serialize is relatively fixed, this may be a good option for you, but if your 'customer' is going to be throwing new classes at you all the time, it may be more effort than it's worth (Definitely easier than XMLEncoder for all the special cases, though).

Another approach is to require your customer to provide the appropriate .betwixt files for any objects they throw at you (that effectively offloads the responsibility to them).

Long and short - serialization is hard - there is no completely brain dead approach to it. Java serialization is as close to a brain dead solution as I've ever seen, but as you've found, incorrect use of the version uid value can break it. Java serialization also requires use of the marker 'Serializable' interface, so if you can't control your source, you are kind of out of luck on that one.

If the requirement is truly as arduous as you describe, you may have to resort to some sort of BCE (Byte code modification) on the objects / aspects / whatever. This is getting way outside the realm of a small development project, and into the realm of Hibernate, Casper or an ORM....

Kevin Day
  • 16,067
  • 8
  • 44
  • 68
0

SBE is an established library for fast, bytebuffer based serialization library and capable of versioning. However it is a bit hard to use as you need to write length wrapper classes over it.

In light of its shortcomings, I have recently made a Java-only serialization library inspired by SBE and FIX-protocol (common financial market protocol to exchange trade/quote messages), that tries to keep the advantages of both while overcoming their weaknesses. You can take a look at https://github.com/iceberglet/anymsg

Iceberglet
  • 73
  • 1
  • 7
-1

Another idea: Use cache. Caches provide much better control, scalability and robustness to the application. Still need to serialize, though, but the management becomes much easier with within a caching service framework. Cache can be persisted in memory, disk, database or array - or all of the options - with one being overflow, stand by, fail-over for the other . Commons JCS and Ehcache are two java implementations, the latter is an enterprise solution free up to 32 GB of storage (disclaimer: I don't work for ehcache ;-)).

Net Dawg
  • 518
  • 1
  • 6
  • 10