Java Externalization vs Transient

Question

I was thinking about the purpose of Externalisation, given that you could simply label a property as transient and prevent its serialisation. However, upon further research I found out that this approach (i.e. labelling as transient) may not be ideal if you need to decide what's required at run-time. Theoretically, it makes sense to me. However, practically I don't see how Externalisation is more run-time friendly. I mean, you still have to decide what's required or not within the writeExternal() and readExternal() during the definition of the class. So, how is that more run-time friendly?

The document that highlighted this is as follows,

If everything is automatically taken care by implementing the Serializable interface, why would anyone like to implement the Externalizable interface and bother to define the two methods? Simply to have the complete control on the process. OKay... let's take a sample example to understand this. Suppose we have an object having hundreds of fields (non-transient) and we want only few fields to be stored on the persistent storage and not all. One solution would be to declare all other fields (except those which we want to serialize) as transient and the default Serialization process will automatically take care of that. But, what if those few fields are not fixed at design tiime instead they are conditionally decided at runtime. In such a situation, implementing Externalizable interface will probably be a better solution. Similarly, there may be scenarios where we simply don't want to maintain the state of the Superclasses (which are automatically maintained by the Serializable interface implementation).

Do you actually need to use `Externalizable`, or would `Serializable` work? You can still provide custom read/write methods. — Andy Turner, Jun 30 '15 at 07:00
And where does `transient` come into this? What do you want to make `transient`? — Andy Turner, Jun 30 '15 at 07:02
I wouldn't say it's more "runtime friendly", it's just more flexible. As you can shape and filter your output/input using conditions, iterations, ... — bvdb, Jun 30 '15 at 07:02
@AndyTurner Please note that it's a fictitious code snippet. The purpose of my question is to understand how the above example is more "run-time" friendly in comparison to using `Serializable` + `transient`. — Grateful, Jun 30 '15 at 07:06
What do you mean by "runtime friendly"? What is it that you are trying to achieve? — Andy Turner, Jun 30 '15 at 07:07
@AndyTurner I presume the text that I found was talking about the ability to decide which variables will be serialised at run-time. — Grateful, Jun 30 '15 at 07:09
I don't know which text you are referring to. In any case, whether you are using serialisation or externalisation, I don't see what "runtime" flexibility either gives: the entire behaviour is determined at compile time; the only thing that changes at runtime is the data in the fields. — Andy Turner, Jun 30 '15 at 07:13
@AndyTurner I am simply trying to understand the document that I read and have now quoted above. — Grateful, Jun 30 '15 at 07:14
A simple example of a good use for Externalisation is if i have create a custom collection implementation. Rather than serialize the internal data structures such as arrays, hashing buckets etc, I could just write out the stored values. Upon loading them back in, i would just add them to the collection using the normal put or add methods. The saved data is simple and neat, and I can then change the implementation of my class without it breaking when i reload serialized data. — slipperyseal, Jun 30 '15 at 07:20
This is an hashmap i wrote (Im not saying it's very good or anything), but it demonstrates the point i was just making… https://github.com/slipperyseal/atomicobjects/blob/master/atomicobjects-lang/src/main/java/net/catchpole/collection/OpenHashMap.java — slipperyseal, Jun 30 '15 at 07:24
@Grateful I added my answer: look in the - flexibility section :) ... (pfew there really is a lot to write about this topic) — bvdb, Jun 30 '15 at 08:26
@Grateful ... and just for the fun of it, added something about JAXB, because I'm wondering if people who use Serialization/Externalization realize that there is a popular third option: JAXB. — bvdb, Jun 30 '15 at 08:44
@EJP the 2008 _Geek Explains_ article ["Diff between Externalizable and Serializable in Java"](http://geekexplains.blogspot.com/2008/06/diff-between-externalizable-and.html) seems to be the document quoted — fspinnenhirn, Feb 11 '16 at 22:00

bvdb · Answer 1 · 2015-06-30T11:09:51.533

I would like to point out that there are other advantages/disadvantages to consider when comparing Serializable and Externalizable methods.

Externalizing is faster

During serialization the JVM will always first check if the class is Externalizable. If that's the case then it will use the read/writeExternal methods. (makes sense, right)

Externalizable classes need less recursion, as you can precisely identify what data you need. It also results in a more compact output (less bytes), which brings us to the next point ...

Externalized output is more compact

If you would compare the actual output, it would look something like this: The header of the object contains a flag that marks if the class is just Serializable or maybe also Externalizable.

OBJECT
CLASSDESC
  Class Name: "MyClassName"
  Class UID:  ...
  Class Desc Flags: SERIALIZABLE or EXTERNALIZABLE

If it's just SERIALIZABLE, then a list of fields will follow (like a definition), followed by the actual data. This is repeated for every serialized object.

  Field Count: ...
  // followed by an bunch of declarations of objects
  Field type: object
  Field name: "fieldName"
  Class name: "Ljava/lang/String;"

 // followed by the actual data
 STRING: "foo"
 STRING: "bar"
 float: 123456

Externalizable objects don't contain a list of fields and data, they just contain the encoded data in the order that you saved it.

  EXTERNALIZABLE: [00 AA 00 BC ... ]

Externalizing is more flexible

If you save a shopping list, then you only want the product names, right ?

public class ShoppingList implements Externalizable {
  String name;
  List<Product> productList;     

  @Override
  public void writeExternal(ObjectOutput pOutput) throws IOException
  {
    out.writeUTF(name);
    for (Product product : productList)
    {
      // save only product id
      out.writeUTF(product.getEanCode());
    }
  }
  ...
}

But if you are making a bill, then you also want to save prices right ?

public class Bill implements Externalizable {
  String name;
  List<Product> productList;     

  @Override
  public void writeExternal(ObjectOutput pOutput) throws IOException
  {
    out.writeUTF(name);
    for (Product product : productList)
    {
      // save product id and price
      out.writeUTF(product.getEanCode());
      out.writeInt(product.getPrice());
    }
  }
  ...
}

So, in some cases the price is transient and in some cases it is not. How would you solve this with the transient keyword ? -- I will let you figure this one out. This kind of flexibility is really missing when using only the transient keyword.

Design considerations

However, there are some dangers as well. Externalizable objects can only be implemented for objects with a public default constructor (a public constructor without arguments).

That makes it impossible to make non-static inner classes Externalizable. The problem is that JVM modifies the constructors at runtime, and adds a reference to the parent class during compilation. So you cannot have a default no-argument constructor for a non-static inner class.

You also have to consider the possibility of modifying your object in future (e.g. adding non-transient fields). Serializable classes could have backwards compatibility issues, but don't require code changes per se. Externalizable classes will require a code change in your read/write method, but have more options to handle compatibility issues.

Just one more thing. If you are chosing this "technology" to communicate between different applications, then please just don't. What you want is JAXB. It's less compact, but more transparent, no compatibility issues, and just as flexible.

Hidden features

Just to be complete, there is one more thing which makes this topic a bit more complicated. It's actually possible to use read/write methods without using the Externalizable interface at all. Before Externalizable was introduced, it was possible to define private writeObject and readObject methods. But really, you shouldn't use that method anymore.

Thank you for all the additional details that were not even asked in the question. — Grateful, Jun 30 '15 at 08:57

initramfs · Accepted Answer · 2015-06-30T07:22:35.463

public class Foo implements Externalizable{
    private long userID;
    private String userName;
    private char[] userPassword;
    private int age;

    private boolean shouldSavePassword;

    public void setSavePassword(boolean shouldSavePassword){
        this.shouldSavePassword = shouldSavePassword;
    }

    void writeExternal(ObjectOutput out) throws IOException{
        out.writeObject(userID);
        out.writeObject(userName);
        out.writeObject(shouldSavePassword);

        if(shouldSavePassword){
            out.writeObject(userPassword);
        }

        out.writeObject(age);
    }

    void readExternal(ObjectInput in) throws IOException, ClassNotFoundException{
        userID = in.readLong();
        userName = (String) in.readObject();
        shouldSavePassword = readBoolean();

        if(shouldSavePassword){
            userPassword = (char[]) in.readObject();
        }

        age = in.readInt();
    }
}

Note how the field userPassword is only serialized upon the runtime value of shouldSavePassword. Had you declared the field transient, you have made the decision of whether to serialize or not a compile-time property, unable to be changed at runtime (unless via reflection).

The flexibility of Externalizable also allows you to determine your own serialization scheme, encrypting objects as necessary if they are sensitive.

Another usecase would potentially be the option to attach a one-way hash to the end of the class for maximum reliability. A field could determine whether or not to save the hash (since it's extra computation).

The bottom line is, transient doesn't give you any runtime control over how serialization of your object will be done, merely that the field will or will not be serialized (as a compile-time parameter).

Disclaimer: The example given above is a terrible scheme to save passwords, do not use it for any production application. Plaintext passwords should be saved after passing through PBPDFs such as bcrypt, PBKDF#2 and scrypt.

@Grateful Oh man, I must be sleepy. `implements` is what I mean, I changed it. — initramfs, Jun 30 '15 at 07:18
Aaaaah, okay! Thank you for that example. It's clear to me now. — Grateful, Jun 30 '15 at 07:19