What's the fastest Serialization mechanism for c#?

Question

This for small payloads.

I am looking to achieve 1,000,000,000 per 100ms.

The standard BinaryFormatter is very slow. The DataContractSerializer is slow than BinaryFormatter.

Protocol buffers (http://code.google.com/p/protobuf-net/) seems slower than the BinaryFormatter for small objects!

Are there any more Serialization mechanisms a should be looking at either hardcore coding or open source projects?

EDIT: I am serializing in-memory then transmitting the payload over tcp on a async socket. The payloads generated in memory and are small double arrays (10 to 500 points) with a ulong identifier.

10,000 items in a micro-second? What kind of hardware are you running on? — Oded, Feb 05 '11 at 10:39
Protobuf is the fastest I know of. But your demands are insane. 0.1 nanoseconds per item is a fraction of a cycle on common hardware. — CodesInChaos, Feb 05 '11 at 10:41
I bet that it's cheaper to buy one more server than having to investigate and/or create a faster serializer. — jgauffin, Feb 05 '11 at 10:58
22 bytes per array minimum * 1Billion * 0.1 seconds means you require 220GBps = 1760 Gbps connection MINIMUM. That's pretty fast. That's over 16 times Verizon's planned backbone speed of 100Gbps. The fastest trunk line currently available is only 1.6Tbs, less than your MINIMUM requirements. Basically, you got no chance. Additionally, you should write a custom serializer given the simplicity of your data format. — DanielOfTaebl, Sep 12 '11 at 11:15
Serializing only the difference between the previous and current state can give you what you want if changes are rare — Andriy Tylychko, Nov 01 '13 at 13:41
possible duplicate of [Fastest way to serialize and deserialize .NET object](http://stackoverflow.com/questions/4143421/fastest-way-to-serialize-and-deserialize-net-object), [performance-tests-of-serializations-used-by-wcf-bindings?](http://stackoverflow.com/questions/3790728/performance-tests-of-serializations-used-by-wcf-bindings?lq=1) — nawfal, Jul 10 '14 at 10:41

score 11 · Answer 1 · answered Feb 05 '11 at 10:42

11

Your performance requirement restricts the available serializers to 0. A custom BinaryWriter and BinaryReader would be the fastest you could get.

answered Feb 05 '11 at 10:42

Darin Dimitrov

1,023,142
271
3,287
2,928

score 6 · Answer 2 · edited Jan 20 '21 at 15:30

I'd have expected Protobuf-net to be faster even for small objects... but you may want to try my Protocol Buffer port as well. I haven't used Marc's port for a while - mine was faster when I last benchmarked, but I'm aware that he's gone through a complete rewrite since then :)

I doubt that you'll achieve serializing a billion items in 100ms whatever you do though... I think that's simply an unreasonable expectation, especially if this is writing to disk. (Obviously if you're simply overwriting the same bit of memory repeatedly you'll get a lot better performance than serializing to disk, but I doubt that's really what you're trying to do.)

If you can give us more context, we may be able to help more. Are you able to spread the load out over multiple machines, for example? (Multiple cores serializing to the same IO device is unlikely to help, as I wouldn't expect this to be a CPU-bound operation if it's writing to a disk or the network.)

EDIT: Suppose each object is 10 doubles (8 bytes each) with a ulong identifier (4 bytes). That's 84 bytes per object at minimum. So you're trying to serialize 8.4GB in 100ms. I really don't think that's achievable, whatever you use.

I'm running my Protocol Buffers benchmarks now (they give bytes serialized per second) but I highly doubt they'll give you what you want.

Hi, thanks for your reply. I am serializing in memory than transmitting the payload over tcp on a async socket. I'll try your PB lib and revert. — RogerGreen, Feb 05 '11 at 10:46
@RogerGreen: Do you really expect to be able to transmit a billion objects in 100ms over a network? It's simply *not going to happen*, even if you can do that much in memory. The serialization itself is *not* going to be the bottleneck. — Jon Skeet, Feb 05 '11 at 10:56

score 4 · Answer 3 · edited May 23 '17 at 12:00

You claim small items are slower than BinaryFormatter, but every time I'e measured it I've found the exact opposite, for example:

Performance Tests of Serializations used by WCF Bindings

I conclude, especially with the v2 code, that this may well be your fastest option. If you can post your specific benchmark scenario I'll happily help see what is "up"... If you can't post it here, if you want to email it to me directly (see profile) that would be OK too. I don't know if your stated timings are possible under any scheme, but I'm very sure I can get you a lot faster than whatever you are seeing.

With the v2 code, the CompileInPlace gives the fastest result - it allows some IL tricks that it can't use if compiling to a physical dll.

Hans Passant · Answer 4 · 2011-02-05T15:02:32.530

The only reason to serialize objects is to make them compatible with a generic transport medium. Network, disk, etc. The perf of the serializer never matters because the transport medium is always so much slower than the raw perf of a CPU core. Easily by two orders of magnitude or more.

Which is also the reason that attributes are an acceptable trade-off. They are also I/O bound, their initialization data has to be read from the assembly metadata. Which requires a disk read for the first time.

So, if you are setting perf requirements, you need to focus 99% on the capability of the transport medium. A billion 'payloads' in 100 milliseconds requires very beefy hardware. Assume a payload is 16 bytes, you'll need to move 160 gigabytes in a second. This is quite beyond even the memory bus bandwidth inside the machine. DDR RAM moves at about 5 gigabytes per second. A one gigabit Ethernet NIC moves at 125 megabytes per second, burst. A commodity hard drive moves at 65 megabytes per second, assuming no seeking.

Your goal is not realistic with current hardware capabilities.

Of course, different serializers have different bandwidth profiles, and while I agree that CPU isn't the biggest problem (compared with bandwidth), it can matter - for example, IPC on the same box, or using serialization as an in-process snapshot/clone/memento etc. — Marc Gravell, Feb 05 '11 at 16:07

score 0 · Answer 5 · answered Feb 02 '13 at 08:50

This is the FASTEST approach i'm aware of. It does have its drawbacks. Like a rocket, you wouldn't want it on your car, but it has its place. Like you need to setup your structs and have that same struct on both ends of your pipe. The struct needs to be a fixed size, or it gets more complicated then this example.

Here is the perf I get on my machine (i7 920, 12gb ram) Release mode, without debugger attached. It uses 100% cpu during the test, so this test is CPU bound.

Finished in 3421ms, Processed 52.15 GB
For data write rate of 15.25 GB/s
Round trip passed

.. and the code...

    class Program
{
    unsafe
    static void Main(string[] args)
    {
        int arraySize = 100;
        int iterations = 10000000;
        ms[] msa = new ms[arraySize];
        for (int i = 0; i < arraySize; i++)
        {
            msa[i].d1 = i + .1d;
            msa[i].d2 = i + .2d;
            msa[i].d3 = i + .3d;
            msa[i].d4 = i + .4d;
            msa[i].d5 = i + .5d;
            msa[i].d6 = i + .6d;
            msa[i].d7 = i + .7d;
        }

        int sizeOfms = Marshal.SizeOf(typeof(ms));
        byte[] bytes = new byte[arraySize * sizeOfms];

        TestPerf(arraySize, iterations, msa, sizeOfms, bytes);

        // lets round trip it.
        var msa2 = new ms[arraySize]; // Array of structs we want to push the bytes into
        var handle2 = GCHandle.Alloc(msa2, GCHandleType.Pinned);// get handle to that array
        Marshal.Copy(bytes, 0, handle2.AddrOfPinnedObject(), bytes.Length);// do the copy
        handle2.Free();// cleanup the handle

        // assert that we didnt lose any data.
        var passed = true;
        for (int i = 0; i < arraySize; i++)
        {
            if(msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1
                ||msa[i].d1 != msa2[i].d1)
            {passed = false;
            break;
            }
        }
        Console.WriteLine("Round trip {0}",passed?"passed":"failed");
    }

    unsafe private static void TestPerf(int arraySize, int iterations, ms[] msa, int sizeOfms, byte[] bytes)
    {
        // start benchmark.
        var sw = Stopwatch.StartNew();
        // this cheats a little bit and reuses the same buffer 
        // for each thread, which would not work IRL
        var plr = Parallel.For(0, iterations/1000, i => // Just to be nice to the task pool, chunk tasks into 1000s
            {
                for (int j = 0; j < 1000; j++)
                {
                    // get a handle to the struc[] we want to copy from
                    var handle = GCHandle.Alloc(msa, GCHandleType.Pinned);
                    Marshal.Copy(handle.AddrOfPinnedObject(), bytes, 0, bytes.Length);// Copy from it
                    handle.Free();// clean up the handle
                    // Here you would want to write to some buffer or something :)
                }
            });
        // Stop benchmark
        sw.Stop();
        var size = arraySize * sizeOfms * (double)iterations / 1024 / 1024 / 1024d; // convert to GB from Bytes
        Console.WriteLine("Finished in {0}ms, Processed {1:N} GB", sw.ElapsedMilliseconds, size);
        Console.WriteLine("For data write rate of {0:N} GB/s", size / (sw.ElapsedMilliseconds / 1000d));
    }
}

[StructLayout(LayoutKind.Explicit, Size= 56, Pack=1)]
struct ms
{
    [FieldOffset(0)]
    public double d1;
    [FieldOffset(8)]
    public double d2;
    [FieldOffset(16)]
    public double d3;
    [FieldOffset(24)]
    public double d4;
    [FieldOffset(32)]
    public double d5;
    [FieldOffset(40)]
    public double d6;
    [FieldOffset(48)]
    public double d7;
}

arachnode.net · Answer 6 · 2014-06-07T07:23:32.263

If you don't want to take the time to implement a comprehensive explicit serialization/de-serialization mechanism, try this: http://james.newtonking.com/json/help/html/JsonNetVsDotNetSerializers.htm ...

In my usage with large objects (1GB+ when serialized to disk) I find that the file generated by the NewtonSoft library is 4.5 times smaller and takes 6 times fewer seconds to process than when using the BinaryFormatter.

score 0 · Answer 7 · answered Feb 05 '11 at 10:43

0

You could write a custom serialization by implement ISerailizable on your data structures. Anyway you will probably face some "impedence" from the hardware itself to serialize with these requirements.

answered Feb 05 '11 at 10:43

Felice Pollano

32,832
9
75
115

With ISerializable you are still paying the BiraryFormatter type metadata tax; this is unnecessary overhead – Marc Gravell Feb 05 '11 at 11:05

score 0 · Answer 8 · answered Feb 05 '11 at 10:45

0

Proto-Buff is really quick but has got limitatins. => http://code.google.com/p/protobuf-net/wiki/Performance

answered Feb 05 '11 at 10:45

RaM

1,126
10
25

score 0 · Answer 9 · edited May 23 '17 at 12:22

In my experience, Marc's Protocol Buffers implementation is very good. I haven't used Jon's. However, you should be trying to use techniques to minimise the data and not serialise the whole lot.

I would have a look at the following.

If the messages are small you should look at what entropy you have. You may have fields that can be partially or completely be de-duplicated. If the communication is between two parties only you may get benefits from building a dictionary both ends.
You are using TCP which has an overhead enough without a payload on top. You should minimise this by batching your messages in to larger bundles and/or look at UDP instead. Batching itself when combined with #1 may get you closer to your requirement when you average your total communication out.
Is the full data width of double required or is it for convenience? If the extra bits are not used this will be a chance for optimisation when converting to a binary stream.

Generally generic serialisation is great when you have multiple messages you have to handle over a single interface or you don't know the full implementation details. In this case it would probably be better to build your own serialisation methods to convert a single message structure directly to byte arrays. Since you know the full implementation both sides direct conversion won't be a problem. It would also ensure that you can inline the code and prevent box/unboxing as much as possible.

What's the fastest Serialization mechanism for c#?

9 Answers9

Linked