I've just lost a couple of hours debugging my app, and I believe I've stumbled upon a (another one o_O) Java bug... sniff... I hope it is not, because this would be sad :(
I'm doing the following:
- Creating an EnumSet
mask
with some flags - Serializing it (with
ObjectOutputStream.writeObject(mask)
) - Clearing and setting some other flags in the
mask
- Serializing it again
Expected result: the second serialized object is different from the first one (reflects the changes in the instance)
Obtained result: the second serialized object is the exact copy of the first one
The code:
enum MyEnum {
ONE, TWO
}
@Test
public void testEnumSetSerialize() throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream stream = new ObjectOutputStream(bos);
EnumSet<MyEnum> mask = EnumSet.noneOf(MyEnum.class);
mask.add(MyEnum.ONE);
mask.add(MyEnum.TWO);
System.out.println("First serialization: " + mask);
stream.writeObject(mask);
mask.clear();
System.out.println("Second serialization: " + mask);
stream.writeObject(mask);
stream.close();
ObjectInputStream istream = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
System.out.println("First deserialized " + istream.readObject());
System.out.println("Second deserialized " + istream.readObject());
}
It prints:
First serialization: [ONE, TWO] Second serialization: [] First deserialized [ONE, TWO] Second deserialized [ONE, TWO] <<<<<< Expecting [] here!!!!
Am I using EnumSet
incorrectly? Do I have to create a new instance each time instead of clearing it?
Thanks for your input!
**** UPDATE ****
My initial idea was to use an EnumSet
as a mask to indicate which fields will be present or absent in the message that follows, so a sort of bandwidth and cpu usage optimization. It was very wrong!!! An EnumSet
takes ages to serialize, and each instance takes 30 (!!!) bytes! So much for the space economy :)
In a nutshell, while ObjectOutputStream
is very fast for primitive types (as I figured out already in a small test here: https://stackoverflow.com/a/33753694), it is painfully slooooow and inefficient with (especially small) objects...
So I worked around it by making my own EnumSet backed by an int, and serializing/deserializing the int directly (not the object).
static class MyEnumSet<T extends Enum<T>> {
private int mask = 0;
@Override
public boolean equals(Object o) {
if (o == null || getClass() != o.getClass()) return false;
return mask == ((MyEnumSet<?>) o).mask;
}
@Override
public int hashCode() {
return mask;
}
private MyEnumSet(int mask) {
this.mask = mask;
}
public static <T extends Enum<T>> MyEnumSet<T> noneOf(Class<T> clz) {
return new MyEnumSet<T>(0);
}
public static <T extends Enum<T>> MyEnumSet<T> fromMask(Class<T> clz, int mask) {
return new MyEnumSet<T>(mask);
}
public int mask() {
return mask;
}
public MyEnumSet<T> add(T flag) {
mask = mask | (1 << flag.ordinal());
return this;
}
public void clear() {
mask = 0;
}
}
private final int N = 1000000;
@Test
public void testSerializeMyEnumSet() throws Exception {
ByteArrayOutputStream bos = new ByteArrayOutputStream(N * 100);
ObjectOutputStream out = new ObjectOutputStream(bos);
List<MyEnumSet<TestEnum>> masks = Lists.newArrayList();
Random r = new Random(132477584521L);
for (int i = 0; i < N; i++) {
MyEnumSet<TestEnum> mask = MyEnumSet.noneOf(TestEnum.class);
for (TestEnum f : TestEnum.values()) {
if (r.nextBoolean()) {
mask.add(f);
}
}
masks.add(mask);
}
logger.info("Serializing " + N + " myEnumSets");
long tic = TicToc.tic();
for (MyEnumSet<TestEnum> mask : masks) {
out.writeInt(mask.mask());
}
TicToc.toc(tic);
out.close();
logger.info("Size: " + bos.size() + " (" + (bos.size() / N) + "b per object)");
logger.info("Deserializing " + N + " myEnumSets");
MyEnumSet<TestEnum>[] deserialized = new MyEnumSet[masks.size()];
ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
tic = TicToc.tic();
for (int i = 0; i < deserialized.length; i++) {
deserialized[i] = MyEnumSet.fromMask(TestEnum.class, in.readInt());
}
TicToc.toc(tic);
Assert.assertArrayEquals(masks.toArray(), deserialized);
}
It's about 130x times faster during serialization and 25x times faster during deserialization...
MyEnumSets:
17/12/15 11:59:31 INFO - Serializing 1000000 myEnumSets 17/12/15 11:59:31 INFO - Elapsed time is 0.019 s 17/12/15 11:59:31 INFO - Size: 4019539 (4b per object) 17/12/15 11:59:31 INFO - Deserializing 1000000 myEnumSets 17/12/15 11:59:31 INFO - Elapsed time is 0.021 s
Regular EnumSets:
17/12/15 11:59:48 INFO - Serializing 1000000 enumSets 17/12/15 11:59:51 INFO - Elapsed time is 2.506 s 17/12/15 11:59:51 INFO - Size: 30691553 (30b per object) 17/12/15 11:59:51 INFO - Deserializing 1000000 enumSets 17/12/15 11:59:51 INFO - Elapsed time is 0.489 s
It's not as safe though. For example, it will not work for enums with more than 32 entries.
How can I ensure that the enum has less than 32 values on MyEnumSet creation?