4

Apparently Java serialization mechanism somehow manages to create an instance of subclass using superclass constructor. I wonder, how is it possible?

Here's a test which demonstrates this:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.text.MessageFormat;

public class Test {

    public static class A {
        public final int a;

        public A() {
            this.a = 0;
            System.out.println(
                    MessageFormat.format(
                        "new A() constructor is called to create an instance of {0}.",
                    getClass().getName()));
        }

        public A(int a) {
            this.a = a;
            System.out.println(
                    MessageFormat.format(
                        "new A(int) constructor is called to create an instance of {0}.", 
                    getClass().getName()));
        }
    }

    public static class B extends A implements Serializable {
        public final int b;

        public B(int a, int b) {
            super(a);
            this.b = b;
            System.out.println(
                    MessageFormat.format(
                        "new B(int, int) constructor is called to create an instance of {0}.",
                    getClass().getName()));
        }

        @Override
        public String toString() {
            return "B [a=" + a + ", b=" + b + "]";
        }


    }

    public static void main(String[] args) throws Exception {

        B b1 = new B(10,20);

        System.out.println(b1);

        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        try(ObjectOutputStream oos = new ObjectOutputStream(bos)) {
            oos.writeObject(b1);
        }

        ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
        try (ObjectInputStream ois = new ObjectInputStream(bis)) {
            B b2 = (B)ois.readObject();
            System.out.println(b2);
        }
    }
}

Output:

new A(int) constructor is called to create an instance of Test$B.
new B(int, int) constructor is called to create an instance of Test$B.
B [a=10, b=20]
new A() constructor is called to create an instance of Test$B.
B [a=0, b=20]

(You can try it out live on Ideone).

As you see, the A() constructor is called during deserialization to produce an instance of B. Under the hood this is invoked in ObjectStreamClass.newInstance() and the instance is created by the Constructor.newInstance() call. In the debugger, the constructor cons is Test$A():

Screenshot from the the debugger showing that <code>cons</code> is <code>Test$A()</code>

Stepping out in the debugger, the created object is finally returned from ObjectInputStream.readObject(...) and it is casted without problems to B.

So if I am not mistaken, it seems that the A() constructor was used (via reflection) to create an instance of B.

I wonder how is this possible.

lexicore
  • 42,748
  • 17
  • 132
  • 221
  • 1
    Use a byte code viewer: `new Type()` first creates an instruction `NEW Type` and then calls the constructor with `INVOKESPECIAL ...` on this instance. So a constructor always expects the object of the final type on the stack. Calling super does not create an object of B, it receives it. – CoronA Apr 15 '18 at 15:13
  • 1
    @CoronA This sounds logical, but I don't see how `cons.newInstance();` would know what the "final type" is. `cons` is `Test$A()`, I don't see where `B` is involved here at all. – lexicore Apr 15 '18 at 15:20
  • I would point to a field in the `java.reflect.Constructor`: `private volatile ConstructorAccessor constructorAccessor`. It is filled with an object of type `GeneratedSerializationConstructorAccesor1@...`. Its use can be found in `Constructor.newInstance` – CoronA Apr 15 '18 at 16:04
  • @CoronA Getting closer, but I don't have the full picture yet. – lexicore Apr 15 '18 at 16:14
  • @CoronA I think this constructor accessor is just "reflection-optimized-by-bytecode-generation". Still unclear. – lexicore Apr 15 '18 at 16:17
  • Amazing example, I am also puzzled. – CoronA Apr 15 '18 at 17:42
  • Dear downvoters, please provide reasons for downvotes. – lexicore Apr 15 '18 at 17:46
  • @CoronA I think I've found it. The constructor seems to be manipulated. See `ObjectStreamClass.getSerializableConstructor` - calls `sun.reflect.ReflectionFactory.newConstructorForSerialization(Class>, Constructor>)`, [source code](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/reflect/ReflectionFactory.java#ReflectionFactory.newConstructorForSerialization%28java.lang.Class%2Cjava.lang.reflect.Constructor%29). – lexicore Apr 15 '18 at 18:00
  • @CoronA That `newConstructorForSerialization` method receives `classToInstantiate` and `constructorToCall`. Then it apparently generates some magic byte code - there's literaly `MagicAccessorImpl` created. – lexicore Apr 15 '18 at 18:03
  • Oh sorry, I found it almost the same time ... and posted the solution. – CoronA Apr 15 '18 at 18:04
  • @CoronA No need to be sorry. Good job! – lexicore Apr 15 '18 at 18:05

2 Answers2

2

I had the suspicion that something must be wrong with the constructor cons. And I have found the location where the ordinary constructor of A is changed to a serializable constructor of B.

First I looked where cons is first set. In the serialization case this is the constructor of ObjectStreamClass:

if (externalizable) {
   cons = getExternalizableConstructor(cl);
} else {
   cons = getSerializableConstructor(cl); //here
   ...

So I stepped through and found in ObjectStreamClass.getSerializableConstructor:

Constructor<?> cons = initCl.getDeclaredConstructor((Class<?>[]) null);
...
cons = reflFactory.newConstructorForSerialization(cl, cons); //this does change
cons.setAccessible(true);
return cons;

Putting a debug watch on cons.newInstance()

  • before the marked line => type is A.
  • after the marked line => type is B.

that means that the constructor used for serialization is not the ordinary constructor of A but a modified one for serialization, which is adapted to the final class.

CoronA
  • 7,717
  • 2
  • 26
  • 53
  • I've temporarily unaccepted your answer, I'll put/award you a bounty and the accept again. Thank you very much for your persistence. – lexicore Apr 15 '18 at 18:09
  • Maybe the downvoter could give a hint where his/her expectation was not met? Thank you @lexicore, for pointing me to this interesting detail. – CoronA Apr 19 '18 at 05:15
0

JVM during deserialization doesn't call a constructor of the class being deserialized. But in order to create an instance of the deserialized class it need to create its super classes first. So JVM calls the no args constructor of first parent that doesn't implement Serializable. Though it doesn't create an instance of your class in this constructor. If your parent class would be Serializable, there wasn't constructor calls at all.

  • `ObjectStreamClass.newInstance()` does call `Test$A()` constructor via reflection. I wonder how the result can be `B`. – lexicore Apr 15 '18 at 15:17
  • It isn't final result. It's about how JVM implementing inheritance and serialization. To create object JVM should creates all its parents. Serializable parents are created without constructor calls. But JVM can't create not Serializable parent without constructor call. – Bogdan Lukiyanchuk Apr 15 '18 at 15:20
  • So JVM creates not serializable parent using constructor and then continue deserializing without constructor calls – Bogdan Lukiyanchuk Apr 15 '18 at 15:21
  • Is serialization a part of JVM? I don't see any JVM magic in `ObjectInputStream` and co. Some reflection usage, but nothing too peculiar. – lexicore Apr 15 '18 at 15:23
  • 1
    *So JVM creates not serializable parent using constructor and then continue deserializing without constructor calls* - Ok, let's leave it with JVM. But how does it work that an instance created with the `Test$A()` constructor is an instance of `B`? Because this is what happens here. – lexicore Apr 15 '18 at 15:25
  • Test$A() method is called. But it doesn't mean that JVM after it doesn't do some internal logic. – Bogdan Lukiyanchuk Apr 15 '18 at 15:27
  • *But it doesn't mean that JVM after it doesn't do some internal logic.* - may be the case, but why, how and where exactly? Otherwise its just "something magical happens". – lexicore Apr 15 '18 at 15:29
  • The deserialization in JVM is a hack because it doesn't call a constructor of deserializing object. So it does something other to create the object. When it does the other thing it calls the parent constructor. It's no more problem than deserialization itself. – Bogdan Lukiyanchuk Apr 15 '18 at 15:30
  • In this case it does call the constructor of the superclass via reflection. I am asking, how the created instance can be an instance of the subclass. – lexicore Apr 15 '18 at 15:37
  • So you don't see a problem in creating an object without calling a constructor, when you have just one class that implements `Serializable` or you would say that JVM calls constructor of `Object ` class and makes it represent itself as your class? – Bogdan Lukiyanchuk Apr 15 '18 at 15:42
  • @Bogdan: Put the reflective call of `Constructor.newInstance` at any other place and it will create an instance of `A`. There is no hint in the byte code why the same constructor may here create an instanceof `B`. And byte code is the truth. It does not make sense that byte code is interpreted another way in the class `ObjectStreamClass`. – CoronA Apr 15 '18 at 15:53
  • @BogdanLukiyanchuk *So you don't see a problem...* - sorry I don't really understand your comment. I do have a problem which, I think, I've described in good detail. I don't have a problem with calling the superclass constructor, I have a problem with the returned instance being instance of subclass. – lexicore Apr 15 '18 at 16:16
  • @CoronA I can rephrase what you are asking. How JVM does determine a class of an object in runtime. I don't know the answer to this question. It's possible that implementations may differ. But I can imagine that class has a link to its `.Class` object and when `readObject ` is called JVM sets this link properly. If you want it for sure you need to go deeper and read specification and source code of JVM. – Bogdan Lukiyanchuk Apr 15 '18 at 16:16
  • @lexicore I meant that deserialization has problems without inheritance and superclass because it creates an object without calling its constructor. In other discussion people said that actually a constructor of `Object` is called because it doesn't implement `Serializable`. So before involving inheritance you need to understand what is going on without inheritance. – Bogdan Lukiyanchuk Apr 15 '18 at 16:21
  • @lexicore Bogdan is right. The constructor of `A` isn't returning an instance of `B`, that's just the illusion given by your logs. The constructor of `A` is being called, then the values of `B` are being written to it. You don't log the "values being written" part, so your logs give the illusion of `A()` returning an instance of `B`. – Vince Apr 15 '18 at 16:49
  • @Vince: If you just check the type of the result after the constructor call with a debugger, you will get type `B`. And this is pretty logical, because writing values of `B` to an object of type `A` would be inconsistent. – CoronA Apr 15 '18 at 17:15
  • @VinceEmigh *The constructor of `A` isn't returning an instance of `B`* - somehow it does in this case. How would you set values to fields of `B` if it's not an instance of `B`? Also the returned value is casted to `B` without any problems. – lexicore Apr 15 '18 at 17:17
  • @CoronA that has nothing to do with Serialization. If `B` extends `A`, a debugger shows in constructor of class `A` that `this` is instance of `B`. – Bogdan Lukiyanchuk Apr 15 '18 at 17:22
  • @lexicore Are you able to step through to `defaultReadObject()` to see when that occurs during this initialization process? I'm on my phone, so I can't test this, and your image didn't show any signs of `A()` returning `B`. I don't think the debugger is showing you the full picture. `defaultReadObject()` is documented to occur if you didn't declare `readObject()` – Vince Apr 15 '18 at 17:27
  • @Bogdan: The constructor of `B` is not called. And the one of `A` is called and the result is `B`. Please use the debugger to verify. @VinceEmigh: `defaultReadObject` is not on the stacktrace. – CoronA Apr 15 '18 at 17:32
  • @VinceEmigh I don't actually see values of local variables in debugger. But "stepping out" a few times it's basically just the value returned and casted/assigned to `B`. I can't completely eliminate the possibility that `cons.newInstance()` actualy returns an instance of `A`, and it is then somehow converted to `B`, but I don't see where it is done in code or when tracing in the debugger. – lexicore Apr 15 '18 at 17:32
  • Debugging with Intellij shows up `B` the first time the value is written to a local variable. – CoronA Apr 15 '18 at 17:35
  • @lexicore Once again, could you step through `defaultReadObject()`? If not, I'd say it's safer to assume your debugger isn't capturing the entire process, rather than assume what you currently are.. – Vince Apr 15 '18 at 17:36
  • @VinceEmigh I have stepped through `readObject` up to the place the instance is created and back to the cast and assignment of the returned value. I can confirm the created instance is `Test$B`. `defaultReadObject()` is not called at all. – lexicore Apr 15 '18 at 17:41
  • @VinceEmigh Sorry, I don't accept this "your debugger isn't capturing the entire process". Maybe it does not, but the spec is quite clear that subclass constructor is to be called - and the result is still an instance of the subclass. I see it as very well estabilished that superclass constructor is somehow used to create an instance of subclass. It is both stated in the spec and heavily indicated by my debugging/tracing so far. – lexicore Apr 15 '18 at 17:44
  • @lexicore The spec says "*Note that field initializers and constructors are not executed for serializable classes during deserialization*", so the subclass constructor ois *not* called. I now realize my answer doesn't cover the question, and I'm currently digging into it more. – Vince Apr 15 '18 at 17:46
  • @lexicore The first line of output in your post. `new A(int) constructor is called to create an instance of Test$B.` Object `b1` of class `B` is not created yet but `this` already knows that it has class `B` not `A`. So JVM passes to the constructor the real class of object being created. – Bogdan Lukiyanchuk Apr 15 '18 at 18:04