Why does specifying Map's initial capacity cause subsequent serializations to give different results?

Question

I am trying to compare 2 byte[] which are the results of serialization of the same object:

1 byte[] is created by serializing the object
the other by deserializing the 1st byte[] and then serializing it again.

I do not understand how these 2 arrays can be different. Deserializing the first byte[] should reconstruct the original object, and serializing that object is the same as serializing the original one. So, the 2 byte[] should be the same. However, under certain circumstances they can be different, apparently.

The object I am serializing (State) holds a list of another object (MapWrapper) which in turn holds a single collection. Depending on the collection, I get different results from my comparison code.

Here is the MCVE:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Test {

    public static void main(String[] args) {

        State state = new State();
        state.maps.add(new MapWrapper());

        byte[] pBA = stateToByteArray(state);
        State pC = byteArrayToState(pBA);
        byte[] zero = stateToByteArray(pC);
        System.out.println(Arrays.equals(pBA, zero)); // see output below
        State pC2 = byteArrayToState(pBA);
        byte[] zero2 = stateToByteArray(pC2);
        System.out.println(Arrays.equals(zero2, zero)); // always true
    }

    public static byte[] stateToByteArray(State s) {

        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(bos);
            oos.writeObject(s);
            return bos.toByteArray();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public static State byteArrayToState(byte[] bytes) {

        ObjectInputStream ois;
        try {
            ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
            return (State) ois.readObject();
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
        }
        return null;
    }
}

class State implements Serializable {

    private static final long serialVersionUID = 1L;

    List<MapWrapper> maps = new ArrayList<>();
}

class MapWrapper implements Serializable {

    private static final long serialVersionUID = 1L;

    // Different options, choose one!
//  List<Integer> ints = new ArrayList<>();       true
//  List<Integer> ints = new ArrayList<>(3);      true
//  Map<String, Integer> map = new HashMap<>();   true
//  Map<String, Integer> map = new HashMap<>(2);  false
}

For some reason, if MapWrapper contains a HashMap (or LinkedHashMap) and is initialized with an initial capacity, the serialization gives a different result than a serialization-deserialization-serialization.

I added a 2nd iteration of deserialization-serialization and compared to the 1st. They are always equal. The difference manifests only after the first iteration.

Note that I must create a MapWrapper and add it to the list in State, as done in the start of main, to cause this.

As much as I know, the initial capacity is a performance parameter only. Using the default one or a specified one should not change behavior or functionality.

I am using jdk1.8.0_25 and Windows7.

Why does this happen?

Why are you surprised? If using the "initial capacity" constructor would not have an effect on your map, why would that constructor exist in the first place? What I mean is: maps are actually pretty complex thingies; and as they deal with multiple objects added to them, the question how/if the map "capacity" is handled must be a "complicated" thing, too. So why do you expect that two objects that were "setup" in slightly different ways do the exact same thing? — GhostCat, Jul 28 '16 at 11:48
@GhostCat Then a list should exhibit the same behavior. Besides, look at the code, I'm changing the object *prior* to any serialization, not between them. I expect to get the same object after subsequent serialization-deserialization. It is a method of deep copying. I also updated the code with another iteration of de/serialization and it is always the same as the first iteration. — user1803551, Jul 28 '16 at 11:52

Andy Turner · Accepted Answer · 2016-07-28T12:00:24.973

2

The following line and comment in the HashMap source code of readObject explains the difference:

s.readInt();                // Read and ignore number of buckets

Indeed, looking at the hex of the bytes, the difference is between a number 2 (your configured number of buckets) and a number 16 (the default number of buckets). I haven't checked that's what this particular byte means; but it'd be quite a coincidence if it's something else, considering that's the only difference.

<snip> 08 00 00 00 02 00 00 00 00 78 78   // Original
<snip> 08 00 00 00 10 00 00 00 00 78 78   // Deserialized+serialized.
                   ^

edited Jul 28 '16 at 12:00

answered Jul 28 '16 at 11:57

Andy Turner

137,514
11
162
243

Then why, when I put 16 (same as default, as per the source: `1 << 4; // aka 16`), does it still give different results? – user1803551 Jul 28 '16 at 12:00
1

Because specifying 16 causes the `threshold` to be set, whereas not specifying a capacity leaves it at its initial value; and in [`readObject`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/HashMap.java#1385), `threshold` is only set again if the map is non-empty. Try putting something into the map, and you'll see the bytes come out the same again. – Andy Turner Jul 28 '16 at 12:04
So when I write and read it, where is the difference in the resulting map compared to the original? Subsequent writes and reads don't change it any further. – user1803551 Jul 28 '16 at 12:08
Can you please answer my comment so I get fully understand this behavior? It really seems like an unintentional behavior. – user1803551 Aug 04 '16 at 18:00
I don't have any special insight to offer; I just read the source code, which I'd encourage you to do too to find out what you really want to know (I don't know what that is; right now it feels like [the XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)). I also found [this answer](http://stackoverflow.com/a/28530760/3788176) which suggests that not deserializing the number of buckets is a security measure to prevent a DoS attack. – Andy Turner Aug 04 '16 at 19:23

Why does specifying Map's initial capacity cause subsequent serializations to give different results?

1 Answers1

Linked