9

I'm trying to serialize a Hashset but I'm having no luck. Whenever I try to open the serialized data, I get an empty HashSet. However, a List works fine. Example code:

[Serializable()]
public class MyClass : ISerializable
{
    public MyClass(SerializationInfo info, StreamingContext ctxt)
    {
        HashSet<string> hashset = (HashSet<string>)info.GetValue("hashset", typeof(HashSet<string>));
        List<string> list = (List<string>)info.GetValue("list", typeof(List<string>));
        Console.WriteLine("Printing Hashset:");
        foreach (string line in hashset)
        {
            Console.WriteLine(line);
        }
        Console.WriteLine("Printing List:");
        foreach (string line in list)
        {
            Console.WriteLine(line);
        }
    }

    public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
    {
        HashSet<string> hashset = new HashSet<string>();
        hashset.Add("One");
        hashset.Add("Two");
        hashset.Add("Three");
        info.AddValue("hashset", hashset);
        List<string> list = new List<string>();
        list.Add("One");
        list.Add("Two");
        list.Add("Three");
        info.AddValue("list", list);
    }
}

And when run, it prints out:

Printing Hashset:
Printing List:
One
Two
Three

So the List works fine, but the HashSet comes back empty. A little stuck - can anyone see what I'm doing wrong? Thanks

Frederik
  • 1,458
  • 4
  • 15
  • 26

2 Answers2

6

Update:

As Hans Passant stated there are simple workaround, just call HashSet.OnDeserialization manually.

var hashset = (HashSet<string>)info.GetValue("hashset", typeof(HashSet<string>));
hashset.OnDeserialization(this);

It also helps with other Generic collections.


As far as I can see this is probably bug in HashSet<T> implementation. HashSet correctly serialized into SerializationInfo:

public virtual void GetObjectData(SerializationInfo info, StreamingContext context)
{
  if (info == null)
  {
    throw new ArgumentNullException("info");
  }
  info.AddValue("Version", this.m_version);
  info.AddValue("Comparer", this.m_comparer, typeof(IEqualityComparer<T>));
  info.AddValue("Capacity", (this.m_buckets == null) ? 0 : this.m_buckets.Length);
  if (this.m_buckets != null)
  {
    T[] array = new T[this.m_count];
    this.CopyTo(array);
    info.AddValue("Elements", array, typeof(T[]));
  }
}

and SerializationInfo correctly restored. You can check also by yourself, take a look to: (((System.Collections.Generic.HashSet<string>)(info.m_data[0]))).m_siInfo.m_data[3] but fails to restore its state:

All it do is simply stores SerializationInfo:

protected HashSet(SerializationInfo info, StreamingContext context)
{
  this.m_siInfo = info;
}

You can check (hashset).m_siInfo.MemberValues[3], values was correcly restored by formatter but not "interpreted" by HashSet.

Similar problem has Dictionary<TKey,TValue> or e.g. LinkedList<T>.

List<T> (or similar array based collections such as Stack<T>) has no problem since they serialized as array (without special logic).

Workaround was posted by Hans Passant.

IMHO, BinaryFormatter is not really good and efficient way to store values. You can try to use DataContractSerializer (it can handle such types) or go with serialization helpers such as protobuf.net, json.net etc. See Why is binary serialization faster than xml serialization? and Performance Tests of Serializations used by WCF Bindings

Community
  • 1
  • 1
Nick Martyshchenko
  • 4,231
  • 2
  • 20
  • 24
  • 1
    Please back up your opinion why it is not really good or efficient. – leppie Nov 16 '10 at 12:22
  • Probably this will be another post. If it really need I can post here or just email to you. Just some points: 1. Serialize to disk object with `int` field, you will end with ~153 bytes file, since it have to contain all full type names. Compare it with 4 byte value of `int` itself. 2. Check `BinaryFormatter` implementation or just measure it performance in compare with plain value binary writer. 3. Don't forget about compatibility issues, so if you update assembly on server you have to use some tricks to not fail with deserializing old values. – Nick Martyshchenko Nov 16 '10 at 12:38
  • In case you interesting: we thinking about using `BinaryFormatter` as our serialization backend but found it quite not optimal during number of tests. Our distributed system has 10-50 000 nodes we ended with our own imlementation in 2007 but thinking about switching to ProtoBuffers now since our solution is pretty similar to it. – Nick Martyshchenko Nov 16 '10 at 12:42
  • Thanks - I just ended up calling ToList() on the HashSets when serializing them... – Frederik Nov 16 '10 at 13:40
  • @leppie, check Marc Gravell's answer in Why is binary serialization faster than xml serialization? http://bit.ly/cVmpvG – Nick Martyshchenko Nov 17 '10 at 11:17
  • @Nick Martyshchenko: Point 1-3 is irrelevant. Although `fwrite` is faster and smaller, you have no type safety (and a whole bunch of other niceties, like cyclic data, etc). If size is a problem, you could just gzip the byte stream. – leppie Nov 17 '10 at 12:27
  • @leppie, "it depends" as usually. gziping adds additional performance penalty and memory footprint per client. We have to process at least 10 000 clients per server. Each additional task (i.e. gzipping) drops responsiveness and raise hardware requirements. `BinaryFormatter` adds tremendous metadata with field descriptions (including backing compiler's field names), used types etc. Even gzip can't help you much. Also due heavy reflection dependency it really slow. – Nick Martyshchenko Nov 17 '10 at 13:34
  • @leppie, I don't think I loose type safety with "plain binary writer". `BinaryFormatter` is just convinient way to serialize/deserialize data to transfer it. Your own implementation can be as much safe as you need/want. After all, it just bytes and make it `safe` is convention achieved by using some additional wrappers either framework or your own. See also Mark's answers he give some more info. I updated my answer with links to them. – Nick Martyshchenko Nov 17 '10 at 13:38
  • @Nick Martyshchenko: Have you looked at the options for the BinaryFormatter to make it smaller? Like simple type info, etc? – leppie Nov 17 '10 at 13:39
  • @leppie, last time I checked it is middle of 2007. Can you give me example how to serialize int field at least as 8-10 byte file? And how to avoid preformance loss on large objects? – Nick Martyshchenko Nov 17 '10 at 13:42
  • @Nick Martyshchenko: You do! Serialize any type that you cannot take the address of (as in unsafe), and the binary writer is useless. Now you have to design your own container and generate a graph of objects, and it takes just as long as the BinaryFormatter. I do agree, if you have C style structs, the binary writer is best, but in the managed world, such standa-alone 'data' is scarce. – leppie Nov 17 '10 at 13:45
  • @Nick Martyshchenko: Look at `FormatterTypeStyle`, `FormatterAssemblyStyle` and `TypeFilterLevel`. – leppie Nov 17 '10 at 13:47
  • @leppie, by saying "plain binary writer" I dont mean `System.IO.BinaryWriter` we just implement own serializer for basic types (int, float, packed int, arrays, sets, maps, etc.) and generated serializers based on similar WCF attributes (`DataContract`, `DataMember`). So each object can be serialized either by our serialization backend (just data), either by writing own serialization logic, using basic types or known objects. Last variant used for example to build and serialize data deltas or complex types which can efficently "packed" manually. – Nick Martyshchenko Nov 17 '10 at 13:57
  • @leppie, best I get now is 135 bytes for int field `k` instead of 153 bytes. But performance penalty not gone. – Nick Martyshchenko Nov 17 '10 at 14:08
  • @Nick Martyshchenko: It does not have to emit the type info everytime. Eg: serialize an int[1000] or a couple of int fields. – leppie Nov 17 '10 at 16:13
  • @leppie, but where it helps? I have commands with data in my protocol, each commands usually separated each other. So to send command I have to start serializer over and over again, and it will save type info each time I have serialized command. – Nick Martyshchenko Nov 17 '10 at 18:12
3

The difference is that HashSet<> implements ISerializable, List<> doesn't. The workaround is to call its OnDeserialization() method explicitly, albeit that I'm not sure whether that's the right thing to do.

        var hashset = (HashSet<string>)info.GetValue("hashset", typeof(HashSet<string>));
        hashset.OnDeserialization(this);
        var list = (List<string>)info.GetValue("list", typeof(List<string>));
        // etc..
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536