3

I have the following code that serializes a List to a byte array for transport via Web Services. The code works relatively fast on smaller entities, but this is a list of 60,000 or so items. It takes several seconds to execute the formatter.Serialize method. Anyway to speed this up?

public static byte[] ToBinary(Object objToBinary)
{
    using (MemoryStream memStream = new MemoryStream())
    {
        BinaryFormatter formatter = new BinaryFormatter(null, new StreamingContext(StreamingContextStates.Clone));
        formatter.Serialize(memStream, objToBinary);
        memStream.Seek(0, SeekOrigin.Begin);
        return memStream.ToArray();
    }
}
Dan Atkinson
  • 11,391
  • 14
  • 81
  • 114
AngryHacker
  • 59,598
  • 102
  • 325
  • 594
  • 2
    Why are you manually serialising to a byte array, then letting the web service client re-serialise the byte array as XML? – Christian Hayter Oct 19 '09 at 21:21
  • Christian asks a very good and important question, there! I just blindly assumed there was a good reason for it, but maybe not. Couldn't the OP just generate client proxies with real types and pass those as parameters to the web service methods? – Clay Fowler Oct 19 '09 at 21:25
  • It's a good question. There actually is a reason, but not a good one. The object is defined in an assembly that literally has thousands of objects. If you make VS create proxies for all these objects, the serialization assembly is HUGE, compilation and JIT takes literally forever. So at some point (before me), a decision was made to keep all the DTO objects in an assembly that's referenced by both server and client, then use serialization/deserialization to/from byte arrays to bring objects across. – AngryHacker Oct 19 '09 at 21:44
  • 1
    *If you have a choice*, upgrade your service and client to WCF. That allows you to share the contracts between service and client and bypass proxy class generation entirely. – Christian Hayter Oct 20 '09 at 15:43
  • This related question recommends [**dynamic method serialization**](http://stackoverflow.com/questions/852064/faster-deep-cloning/852082#852082): > [**Faster deep cloning**](http://stackoverflow.com/questions/852064/faster-deep-cloning) – Dirk Vollmar Oct 19 '09 at 22:26

5 Answers5

6

The inefficiency you're experiencing comes from several sources:

  1. The default serialization routine uses reflection to enumerate object fields and get their values.
  2. The binary serialization format stores things in associative lists keyed by the string names of the fields.
  3. You've got a spurious ToArray in there (as Danny mentioned).

You can get a pretty big improvement off the bat by implementing ISerializable on the object type that is contained in your List. That will cut out the default serialization behavior that uses reflection.

You can get a little more speed if you cut down the number of elements in the associative array that holds the serialized data. Make sure the elements you do store in that associative array are primitive types.

Finally, you can eliminate the ToArray but I doubt you'll even notice the bump that gives you.

Kennet Belenky
  • 2,755
  • 18
  • 20
3

if you want some real serialization speed , consider using protobuf-net which is the c# version of google's protocol buffers. it's supposed to be an order of magnitude faster that binary formatter.

geva30
  • 325
  • 2
  • 7
  • +1 for this. I've found that for serialising an 80,000 item List it takes 7 seconds for BinaryFormatter and 800ms for protobuf-net. – Callum Rogers Oct 19 '09 at 22:25
  • 1
    I'd love to do this, but the effort to convert the entire project to Proto buffers would take forever and probably not worth saving 4 seconds in performance gains. – AngryHacker Oct 19 '09 at 22:51
  • well, i recently converted quite a few classes to Proto, and it is very simple to do using attributes. – geva30 Oct 22 '09 at 07:00
1

It would probably be much faster to serialize the entire array (or collection) of 60,000 items in one shot, into a single large byte[] array, instead of in separate chunks. Is having each of the individual objects be represented by its own byte[] array a requirement of other parts of the system you're working within? Also, are the actual Type's of the objects known? If you were using a specific Type (maybe some common base class to all of these 60,000 objects) then the framework would not have to do as much casting and searching for your prebuilt serialization assemblies. Right now you're only giving it Object.

Clay Fowler
  • 2,059
  • 13
  • 15
1

.ToArray() creates a new array, it more be more effcient to copy the data to an existing array using unsafe methods (such as accessing the stream's memory using fixed, then copying the memory using MemCopy() via DllImport).

Also consider using a faster custom formatter.

Danny Varod
  • 17,324
  • 5
  • 69
  • 111
0

I started a code-generator project, that includes a binary DataContract-Serialzer that beats at least Json.NET by a factor of 30. All you need are the generator nuget package and an additional lib that comes with faster replacements of BitConverter.

Then you create a partial class and decorate it with DataContract and each serializable property with DataMember. The generator will then create a ToBytes-method and together with the additional lib you can serialize collections as well. Look at my example from this post:

var objects = new List<Td>();
for (int i = 0; i < 1000; i++)
{
    var obj = new Td
    {
        Message = "Hello my friend",
        Code = "Some code that can be put here",
        StartDate = DateTime.Now.AddDays(-7),
        EndDate = DateTime.Now.AddDays(2),
        Cts = new List<Ct>(),
        Tes = new List<Te>()
    };
    for (int j = 0; j < 10; j++)
    {
        obj.Cts.Add(new Ct { Foo = i * j });
        obj.Tes.Add(new Te { Bar = i + j });
    }
    objects.Add(obj);
}

With this generated ToBytes() method:

public int Size
{
    get 
    { 
        var size = 24;
        // Add size for collections and strings
        size += Cts == null ? 0 : Cts.Count * 4;
        size += Tes == null ? 0 : Tes.Count * 4;
        size += Code == null ? 0 : Code.Length;
        size += Message == null ? 0 : Message.Length;

        return size;              
    }
}

public byte[] ToBytes(byte[] bytes, ref int index)
{
    if (index + Size > bytes.Length)
        throw new ArgumentOutOfRangeException("index", "Object does not fit in array");

    // Convert Cts
    // Two bytes length information for each dimension
    GeneratorByteConverter.Include((ushort)(Cts == null ? 0 : Cts.Count), bytes, ref index);
    if (Cts != null)
    {
        for(var i = 0; i < Cts.Count; i++)
        {
            var value = Cts[i];
            value.ToBytes(bytes, ref index);
        }
    }
    // Convert Tes
    // Two bytes length information for each dimension
    GeneratorByteConverter.Include((ushort)(Tes == null ? 0 : Tes.Count), bytes, ref index);
    if (Tes != null)
    {
        for(var i = 0; i < Tes.Count; i++)
        {
            var value = Tes[i];
            value.ToBytes(bytes, ref index);
        }
    }
    // Convert Code
    GeneratorByteConverter.Include(Code, bytes, ref index);
    // Convert Message
    GeneratorByteConverter.Include(Message, bytes, ref index);
    // Convert StartDate
    GeneratorByteConverter.Include(StartDate.ToBinary(), bytes, ref index);
    // Convert EndDate
    GeneratorByteConverter.Include(EndDate.ToBinary(), bytes, ref index);
    return bytes;
}

It serializes each object in ~1.5 micro seconds -> 1000 objects in 1,7ms.

Community
  • 1
  • 1
Toxantron
  • 2,218
  • 12
  • 23