3

I am trying to compare 2 byte[] which are the results of serialization of the same object:

  • 1 byte[] is created by serializing the object
  • the other by deserializing the 1st byte[] and then serializing it again.

I do not understand how these 2 arrays can be different. Deserializing the first byte[] should reconstruct the original object, and serializing that object is the same as serializing the original one. So, the 2 byte[] should be the same. However, under certain circumstances they can be different, apparently.

The object I am serializing (State) holds a list of another object (MapWrapper) which in turn holds a single collection. Depending on the collection, I get different results from my comparison code.

Here is the MCVE:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Test {

    public static void main(String[] args) {

        State state = new State();
        state.maps.add(new MapWrapper());

        byte[] pBA = stateToByteArray(state);
        State pC = byteArrayToState(pBA);
        byte[] zero = stateToByteArray(pC);
        System.out.println(Arrays.equals(pBA, zero)); // see output below
        State pC2 = byteArrayToState(pBA);
        byte[] zero2 = stateToByteArray(pC2);
        System.out.println(Arrays.equals(zero2, zero)); // always true
    }

    public static byte[] stateToByteArray(State s) {

        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(bos);
            oos.writeObject(s);
            return bos.toByteArray();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public static State byteArrayToState(byte[] bytes) {

        ObjectInputStream ois;
        try {
            ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
            return (State) ois.readObject();
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
        }
        return null;
    }
}

class State implements Serializable {

    private static final long serialVersionUID = 1L;

    List<MapWrapper> maps = new ArrayList<>();
}

class MapWrapper implements Serializable {

    private static final long serialVersionUID = 1L;

    // Different options, choose one!
//  List<Integer> ints = new ArrayList<>();       true
//  List<Integer> ints = new ArrayList<>(3);      true
//  Map<String, Integer> map = new HashMap<>();   true
//  Map<String, Integer> map = new HashMap<>(2);  false
}

For some reason, if MapWrapper contains a HashMap (or LinkedHashMap) and is initialized with an initial capacity, the serialization gives a different result than a serialization-deserialization-serialization.

I added a 2nd iteration of deserialization-serialization and compared to the 1st. They are always equal. The difference manifests only after the first iteration.

Note that I must create a MapWrapper and add it to the list in State, as done in the start of main, to cause this.

As much as I know, the initial capacity is a performance parameter only. Using the default one or a specified one should not change behavior or functionality.

I am using jdk1.8.0_25 and Windows7.

Why does this happen?

user1803551
  • 12,965
  • 5
  • 47
  • 74
  • 3
    Why are you surprised? If using the "initial capacity" constructor would not have an effect on your map, why would that constructor exist in the first place? What I mean is: maps are actually pretty complex thingies; and as they deal with multiple objects added to them, the question how/if the map "capacity" is handled must be a "complicated" thing, too. So why do you expect that two objects that were "setup" in slightly different ways do the exact same thing? – GhostCat Jul 28 '16 at 11:48
  • 1
    @GhostCat Then a list should exhibit the same behavior. Besides, look at the code, I'm changing the object *prior* to any serialization, not between them. I expect to get the same object after subsequent serialization-deserialization. It is a method of deep copying. I also updated the code with another iteration of de/serialization and it is always the same as the first iteration. – user1803551 Jul 28 '16 at 11:52

1 Answers1

2

The following line and comment in the HashMap source code of readObject explains the difference:

s.readInt();                // Read and ignore number of buckets

Indeed, looking at the hex of the bytes, the difference is between a number 2 (your configured number of buckets) and a number 16 (the default number of buckets). I haven't checked that's what this particular byte means; but it'd be quite a coincidence if it's something else, considering that's the only difference.

<snip> 08 00 00 00 02 00 00 00 00 78 78   // Original
<snip> 08 00 00 00 10 00 00 00 00 78 78   // Deserialized+serialized.
                   ^
Andy Turner
  • 137,514
  • 11
  • 162
  • 243
  • Then why, when I put 16 (same as default, as per the source: `1 << 4; // aka 16`), does it still give different results? – user1803551 Jul 28 '16 at 12:00
  • 1
    Because specifying 16 causes the `threshold` to be set, whereas not specifying a capacity leaves it at its initial value; and in [`readObject`](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/HashMap.java#1385), `threshold` is only set again if the map is non-empty. Try putting something into the map, and you'll see the bytes come out the same again. – Andy Turner Jul 28 '16 at 12:04
  • So when I write and read it, where is the difference in the resulting map compared to the original? Subsequent writes and reads don't change it any further. – user1803551 Jul 28 '16 at 12:08
  • Can you please answer my comment so I get fully understand this behavior? It really seems like an unintentional behavior. – user1803551 Aug 04 '16 at 18:00
  • I don't have any special insight to offer; I just read the source code, which I'd encourage you to do too to find out what you really want to know (I don't know what that is; right now it feels like [the XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)). I also found [this answer](http://stackoverflow.com/a/28530760/3788176) which suggests that not deserializing the number of buckets is a security measure to prevent a DoS attack. – Andy Turner Aug 04 '16 at 19:23