8

I wrote a program to serialize a 'Person' class using XMLSerializer, BinaryFormatter and ProtoBuf. I thought protobuf-net should be faster than the other two. Protobuf serialization was faster than XMLSerialization but much slower than the binary serialization. Is my understanding incorrect? Please make me understand this. Thank you for the help.

EDIT :- I changed the code (updated below) to measure the time only for the serialization and not creating the streams and still see the difference. Could one tell me why?

Following is the output:-

Person got created using protocol buffer in 347 milliseconds

Person got created using XML in 1462 milliseconds

Person got created using binary in 2 milliseconds

Code below

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ProtoBuf;
using System.IO;
using System.Diagnostics;
using System.Runtime.Serialization.Formatters.Binary;
namespace ProtocolBuffers
{
    class Program
    {
        static void Main(string[] args)
        {

            string folderPath  = @"E:\Ashish\Research\VS Solutions\ProtocolBuffers\ProtocolBuffer1\bin\Debug";
            string XMLSerializedFileName = Path.Combine(folderPath,"PersonXMLSerialized.xml");
            string ProtocolBufferFileName = Path.Combine(folderPath,"PersonProtocalBuffer.bin");
            string BinarySerializedFileName = Path.Combine(folderPath,"PersonBinary.bin");

            if (File.Exists(XMLSerializedFileName))
            {
                File.Delete(XMLSerializedFileName);
                Console.WriteLine(XMLSerializedFileName + " deleted");
            }
            if (File.Exists(ProtocolBufferFileName))
            {
                File.Delete(ProtocolBufferFileName);
                Console.WriteLine(ProtocolBufferFileName + " deleted");
            }
            if (File.Exists(BinarySerializedFileName))
            {
                File.Delete(BinarySerializedFileName);
                Console.WriteLine(BinarySerializedFileName + " deleted");
            }

            var person = new Person
            {
                Id = 12345,
                Name = "Fred",
                Address = new Address
                {
                    Line1 = "Flat 1",
                    Line2 = "The Meadows"
                }
            };

            Stopwatch watch = Stopwatch.StartNew();

            using (var file = File.Create(ProtocolBufferFileName))
            {
                watch.Start();
                Serializer.Serialize(file, person);
                watch.Stop();
            }

            //Console.WriteLine(watch.ElapsedMilliseconds.ToString());
            Console.WriteLine("Person got created using protocol buffer in " + watch.ElapsedMilliseconds.ToString() + " milliseconds ");

            watch.Reset();

            System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(person.GetType());
            using (TextWriter w = new StreamWriter(XMLSerializedFileName))
            {
                watch.Start();
                x.Serialize(w, person);
                watch.Stop();
            }

            //Console.WriteLine(watch.ElapsedMilliseconds.ToString());
            Console.WriteLine("Person got created using XML in " + watch.ElapsedMilliseconds.ToString() + " milliseconds");

            watch.Reset();

            using (Stream stream = File.Open(BinarySerializedFileName, FileMode.Create))
            {
                BinaryFormatter bformatter = new BinaryFormatter();
                //Console.WriteLine("Writing Employee Information");
                watch.Start();
                bformatter.Serialize(stream, person);
                watch.Stop();
            }

            //Console.WriteLine(watch.ElapsedMilliseconds.ToString());
            Console.WriteLine("Person got created using binary in " + watch.ElapsedMilliseconds.ToString() + " milliseconds");

            Console.ReadLine();



        }
    }


    [ProtoContract]
    [Serializable]
    public class Person
    {
        [ProtoMember(1)]
        public int Id { get; set; }
        [ProtoMember(2)]
        public string Name { get; set; }
        [ProtoMember(3)]
        public Address Address { get; set; }
    }
    [ProtoContract]
    [Serializable]
    public class Address
    {
        [ProtoMember(1)]
        public string Line1 { get; set; }
        [ProtoMember(2)]
        public string Line2 { get; set; }
    }
}
djdd87
  • 67,346
  • 27
  • 156
  • 195
Ashish Gupta
  • 14,869
  • 20
  • 75
  • 134
  • 2
    A few quick notes - first, try to reduce the influence of external factors on your test. Serialize to a memory stream or some other relatively performance-neutral target rather than the file system. Second, your should only time the serialization operation - don't include the creation of your streams or construction of objects. Third, repeat your tests a reasonable number of times and report the aggregated results. – Jeff Sternal Jun 03 '10 at 13:54
  • Thanks for the comments. You mentioned "relatively performance-neutral target rather than the file system". What does that mean? could you please give some examples of a "relatively performance-neutral target"? Thank you. – Ashish Gupta Jun 03 '10 at 13:57
  • 1
    @Ashish - I was thinking primarily of a memory stream. The environment *could* still affect your tests if you serialize to a memory stream (for example, memory pressure might force you to go to virtual memory for one test and not the other), but I think it would be less likely to influence your results than the file system. In retrospect, **it's probably more important to repeat your tests than to try to get absolutely neutral testing conditions**, but striving for those conditions won't hurt. ;) – Jeff Sternal Jun 03 '10 at 14:23
  • @Jeff Sternal - Or at least the StopWatches could be moved within the using statements. Then there'd be no delay in the creation of the file or the closing of the file. – djdd87 Jun 03 '10 at 14:32
  • @Jeff Sternal - Thanks for you time. In real time, I need to create quite a number of files for sure and I cant keep that to memory stream. Do you think the aggregate (for large number of files creation) time comparison would be different? – Ashish Gupta Jun 03 '10 at 14:36
  • @GenericTypeTea, I just edited the code and that didnt make much differece. – Ashish Gupta Jun 03 '10 at 14:41
  • @Ashish, makes about 20ms difference on ProtoBuf. If you do the same operation 1000 times, ProtoBuf comes out faster. I think it may be something to do with ProtoBuf caching the property types the first time it's run and then it's much faster from then on... but you'd need Mr. Gravell to confirm that as it's been a long time since I delved into the Source Code. – djdd87 Jun 03 '10 at 14:44
  • @Ashish, also why use StopWatch.StartNew() when you're instantly calling Start() anyway? – djdd87 Jun 03 '10 at 14:48
  • @Ashish: GenericTypeTea is right - the first time protobuf-net serializes a given type it needs to use reflection to generate serialization code for that type. Subsequent usage is much faster. See [Marc Gravell's answer here](http://stackoverflow.com/questions/1722096/how-does-protobuf-net-achieve-respectable-performance/1723695#1723695) for details. – Jeff Sternal Jun 03 '10 at 14:56
  • @Jeff - that can be avoided, especially in "v2". – Marc Gravell Jun 03 '10 at 23:05

4 Answers4

25

I replied to your e-mail; I didn't realise you'd also posted it here. The first question I have is: which version of protobuf-net? The reason I ask is that the development trunk of "v2" deliberately has auto-compilation disabled, so that I can use my unit tests to test both the runtime and pre-compiled versions. So if you are using "v2" (only available in source), you need to tell it to compile the model - otherwise it is running 100% reflection.

In either "v1" or "v2" you can do this with:

Serializer.PrepareSerializer<Person>();

Having done this, the numbers I get (from the code in your e-mail; I haven't checked if the above is the same sample):

10
Person got created using protocol buffer in 10 milliseconds
197
Person got created using XML in 197 milliseconds
3
Person got created using binary in 3 milliseconds

The other factor is the repeats; 3-10ms is frankly nothing; you can't compare numbers around this level. Upping it to repeat 5000 times (re-using the XmlSerializer / BinaryFormatter instances; no false costs introduced) I get:

110
Person got created using protocol buffer in 110 milliseconds
329
Person got created using XML in 329 milliseconds
133
Person got created using binary in 133 milliseconds

Taking this to sillier extremes (100000):

1544
Person got created using protocol buffer in 1544 milliseconds
3009
Person got created using XML in 3009 milliseconds
3087
Person got created using binary in 3087 milliseconds

So ultimately:

  • when you have virtually no data to serialize, most approaches will be very fast (including protobuf-net)
  • as you add data, the differences become more obvious; protobuf generally excels here, either for individual large graphs, or lots of small graphs

Note also that in "v2" the compiled model can be fully static-compiled (to a dll that you can deploy), removing even the (already small) spin-up costs.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Marc, absolutely! 'protobuf-serialized' files are of lesser size and when you test for a larger number of files, the time is significantly less than the binary. Thank you for your time. :-) – Ashish Gupta Jun 13 '10 at 13:50
  • Think you guys can supplement this with deserialization times as well? It matters too. – David Grenier Jul 28 '11 at 21:45
  • @David I'm not at a PC, but "very fast" is pretty close :) If you *desparately* want that ***for this example*** I can do it, but I have countless other existing measures that all say "fast", in numeric terms - any of those do? Of course, your own data would be even more compelling – Marc Gravell Jul 28 '11 at 21:51
5

I have a slightly different opinion than the marked answer. I think the numbers from these tests reflects the meta-data overhead of binary formatter. BinaryFormatter writes meta-data about the class first before writing data, while protobuf writes only data.

For the very small object (one Person object) in your test, the meta-data cost of binary formatter weighs more than real cases, because it is writing more meta-data than data. So, when you increase the repeat count, the meta-data cost is exaggerated, up to the same level as xml serialization in extreme case.

If you serialize a Person array, and the array is large enough, then the meta-data cost will be trivial to the total cost. Then binary formatter should perform similar to protobuf for your extreme repeat test.

PS: I found this page because I'm evaluating different serializers. I also found a blog http://blogs.msdn.com/b/youssefm/archive/2009/07/10/comparing-the-performance-of-net-serializers.aspx which shows test result that DataContractSerializer + binary XmlDictionaryWriter performs several times better than binary formatter. It also tested with very small data. When I did the test myself with large data, I was surprised to find the result was very different. So do test with real data you will actually use.

Dudu
  • 1,264
  • 13
  • 14
4

We serialize quite large objects (about 50 properties) constantly, so I've written a small test to compare BinaryFormatter and protobuf-net, just as you did and here are my results (10000 objects):

BinaryFormatter serialize: 316
BinaryFormatter deserialize: 279
protobuf serialize: 243
protobuf deserialize: 139
BinaryFormatter serialize: 315
BinaryFormatter deserialize: 281
protobuf serialize: 127
protobuf deserialize: 110

That's obviously a very noticeable difference. It is also much faster on the second run (the tests are exactly the same) than it is on the first.

Update. Doing RuntimeTypeModel.Add..Compile generates following results:

BinaryFormatter serialize: 303
BinaryFormatter deserialize: 282
protobuf serialize: 113
protobuf deserialize: 50
BinaryFormatter serialize: 317
BinaryFormatter deserialize: 266
protobuf serialize: 126
protobuf deserialize: 49
Egor Pavlikhin
  • 17,503
  • 16
  • 61
  • 99
1

If we compare in memory, hard-coded serialization will be quite faster in some situations. If your class simple, maybe better will write your own serializer...

Slightly modified code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ProtoBuf;
using System.IO;
using System.Diagnostics;
using System.Runtime.Serialization.Formatters.Binary;

namespace ProtocolBuffers
{
    class Program
    {
        static void Main(string[] args)
        {

            string folderPath = @"../Debug";
            string XMLSerializedFileName = Path.Combine(folderPath, "PersonXMLSerialized.xml");
            string ProtocolBufferFileName = Path.Combine(folderPath, "PersonProtocalBuffer.bin");
            string BinarySerializedFileName = Path.Combine(folderPath, "PersonBinary.bin");
            string BinarySerialized2FileName = Path.Combine(folderPath, "PersonBinary2.bin");

            if (File.Exists(XMLSerializedFileName))
            {
                File.Delete(XMLSerializedFileName);
                Console.WriteLine(XMLSerializedFileName + " deleted");
            }
            if (File.Exists(ProtocolBufferFileName))
            {
                File.Delete(ProtocolBufferFileName);
                Console.WriteLine(ProtocolBufferFileName + " deleted");
            }
            if (File.Exists(BinarySerializedFileName))
            {
                File.Delete(BinarySerializedFileName);
                Console.WriteLine(BinarySerializedFileName + " deleted");
            }
            if (File.Exists(BinarySerialized2FileName))
            {
                File.Delete(BinarySerialized2FileName);
                Console.WriteLine(BinarySerialized2FileName + " deleted");
            }

            var person = new Person
            {
                Id = 12345,
                Name = "Fred",
                Address = new Address
                {
                    Line1 = "Flat 1",
                    Line2 = "The Meadows"
                }
            };

            Stopwatch watch = Stopwatch.StartNew();

            using (var file = new MemoryStream())
            //using (var file = File.Create(ProtocolBufferFileName))
            {
                watch.Start();
                for (int i = 0; i < 100000; i++)
                    Serializer.Serialize(file, person);
                watch.Stop();
            }

            Console.WriteLine("Person got created using protocol buffer in " + watch.ElapsedMilliseconds.ToString() + " milliseconds ");

            watch.Reset();

            System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(person.GetType());
            using (var w = new MemoryStream())
            //using (TextWriter w = new StreamWriter(XMLSerializedFileName))
            {
                watch.Start();
                for (int i = 0; i < 100000; i++)
                    x.Serialize(w, person);
                watch.Stop();
            }

            Console.WriteLine("Person got created using XML in " + watch.ElapsedMilliseconds.ToString() + " milliseconds");

            watch.Reset();

            using (var stream = new MemoryStream())
            //using (Stream stream = File.Open(BinarySerializedFileName, FileMode.Create))
            {
                BinaryFormatter bformatter = new BinaryFormatter();
                watch.Start();
                for (int i = 0; i < 100000; i++)
                    bformatter.Serialize(stream, person);
                watch.Stop();
            }

            Console.WriteLine("Person got created using binary in " + watch.ElapsedMilliseconds.ToString() + " milliseconds");

            watch.Reset();

            using (var stream = new MemoryStream())
            //using (Stream stream = File.Open(BinarySerialized2FileName, FileMode.Create))
            {
                BinaryWriter writer = new BinaryWriter(stream);
                watch.Start();
                for (int i = 0; i < 100000; i++)
                    writer.Write(person.GetBytes());
                watch.Stop();
            }

            Console.WriteLine("Person got created using binary2 in " + watch.ElapsedMilliseconds.ToString() + " milliseconds");

            Console.ReadLine();
        }
    }


    [ProtoContract]
    [Serializable]
    public class Person
    {
        [ProtoMember(1)]
        public int Id { get; set; }
        [ProtoMember(2)]
        public string Name { get; set; }
        [ProtoMember(3)]
        public Address Address { get; set; }

        public byte[] GetBytes()
        {
            using (var stream = new MemoryStream())
            {
                BinaryWriter writer = new BinaryWriter(stream);

                writer.Write(this.Id);
                writer.Write(this.Name);
                writer.Write(this.Address.GetBytes());

                return stream.ToArray();
            }
        }

        public Person()
        {
        }

        public Person(byte[] bytes)
        {
            using (var stream = new MemoryStream(bytes))
            {
                BinaryReader reader = new BinaryReader(stream);

                Id = reader.ReadInt32();
                Name = reader.ReadString();

                int bytesForAddressLenght = (int)(stream.Length - stream.Position);
                byte[] bytesForAddress = new byte[bytesForAddressLenght];
                Array.Copy(bytes, (int)stream.Position, bytesForAddress, 0, bytesForAddressLenght);
                Address = new Address(bytesForAddress);
            }
        }
    }
    [ProtoContract]
    [Serializable]
    public class Address
    {
        [ProtoMember(1)]
        public string Line1 { get; set; }
        [ProtoMember(2)]
        public string Line2 { get; set; }

        public byte[] GetBytes()
        {
            using(var stream = new MemoryStream())
            {
                BinaryWriter writer = new BinaryWriter(stream);

                writer.Write(this.Line1);
                writer.Write(this.Line2);

                return stream.ToArray();
            }
        }

        public Address()
        {

        }

        public Address(byte[] bytes)
        {
            using(var stream = new MemoryStream(bytes))
            {
                BinaryReader reader = new BinaryReader(stream);

                Line1 = reader.ReadString();
                Line2 = reader.ReadString();
            }
        }
    }
}

and my results:

Person got created using protocol buffer in 141 milliseconds
Person got created using XML in 676 milliseconds
Person got created using binary in 525 milliseconds
Person got created using binary2 in 79 milliseconds
skuvv
  • 11
  • 2