5

Is there any situation when serializing the same object could produce different streams (assuming one of the formatters built-in .NET is used for both serializations)?

This came up in the discussion below this post. The claim was made that this can happen, yet no concrete explanation was offered, so I was wondering if anyone can shed some light on the issue?

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • OK, it seems I have to prove it. **I will try to make a sample app.** – Aliostad Oct 31 '11 at 13:20
  • I don't believe that to be the case. The stream instances may be different, but the content should be the same. Assuming that their state isn't modified during serialization. – Ian Oct 31 '11 at 13:21
  • @Ian see my sample code to create it. – Aliostad Oct 31 '11 at 13:48

3 Answers3

6

As I explained in the comment of that SO question, the issue is caused (at least the case I have discovered) by optimisation of the string output. It seems if strings are the same reference, then it will output it once.

So what we the sample code does it to use a long string for properties of an object and change the reference of one string and then serialise. Then deserialise the stream back again to object (and this time since the string is interned, same reference is used) and then serialise again. This time the stream is smaller.

OK, here is the proof code:

[Serializable]
public class Proof
{
    public string S1 { get; set; }
    public string S2 { get; set; }
    public string S3 { get; set; }
}

class Program
{
    static void Main(string[] args)
    {

        const string LongString =
            "A value that is going to change the world nad iasjdsioajdsadj sai sioadj sioadj siopajsa iopsja iosadio jsadiojasd ";

        var proof = new Proof() {
            S1 = LongString,
            S2 = LongString,
            S3 = LongString
        };

        proof.S2 = LongString.Substring(0, 10) + LongString.Substring(10); // just add up first 10 character with the rest. 
               //This just makes sure reference is not the same although values will be

        Console.WriteLine(proof.S1 == proof.S2);
        Console.WriteLine(proof.S1 == proof.S3);
        Console.WriteLine(proof.S2 == proof.S3);
        Console.WriteLine("So the values are all the same...");

        BinaryFormatter bf = new BinaryFormatter();
        MemoryStream stream = new MemoryStream();
        bf.Serialize(stream, proof);
        byte[] buffer = stream.ToArray();
        Console.WriteLine("buffer length is " + buffer.Length); // outputs 449 on my machine
        stream.Position = 0;
        var deserProof = (Proof) bf.Deserialize(new MemoryStream(buffer));
        deserProof.S1 = deserProof.S2;
        deserProof.S3 = deserProof.S2;
        MemoryStream stream2 = new MemoryStream();
        new BinaryFormatter().Serialize(stream2, deserProof);

        Console.WriteLine("buffer length now is " + stream2.ToArray().Length); // outputs 333 on my machine!!
        Console.WriteLine("What? I cannot believe my eyes! Someone help me ........");

        Console.Read();
    }
Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • nice example; I find it interesting that it went to the trouble of doing a reference check, but without special-casing string equality (since it is such a common one, especially when loading values from a database example). I sigh a happy sigh that *my* serializer knows to check strings for more than just reference equality ;p – Marc Gravell Oct 31 '11 at 13:48
  • I should also add: protobuf also does not guarantee this. OK, in reality it **will** output the same thing, but I had an interesting discussion on the google forum about subnormal forms when representing some of the core data-types (i.e. ways of **validly** representing the same data intentionally using extra bytes) – Marc Gravell Oct 31 '11 at 13:51
3

Some core types (DateTime - noting in particular the "kind", or Decimal noting the scale) could have values that report as equal, but which serialize differently; for example:

using System;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
[Serializable]
class Foo
{
    public decimal Bar { get; set; }
    static void Main()
    {
        Foo foo = new Foo();
        foo.Bar = 123.45M;
        var s = Serialize(foo);

        Foo foo2 = new Foo();
        foo2.Bar = 123.4500M;

        bool areSame = foo.Bar == foo2.Bar; // true
        var s2 = Serialize(foo2);

        bool areSerializedTheSame = s == s2; // false
    }
    static string Serialize(object obj)
    {
        using (var ms = new MemoryStream())
        {
            new BinaryFormatter().Serialize(ms, obj);
            return Convert.ToBase64String(ms.GetBuffer(), 0, (int)ms.Length);
        }
    }
}

As to whether the exact same object could serialize differently - well, that isn't usually a guarantee that serialization makes any claim about. Serialization is about storing and retrieving your data/objects - not about proving equality. For example, xml has all kinds of whitespace and namespace normalization issues that make it unsuitable for comparison. BinaryFormatter and other binary serializers look more tempting, but I'm not sure that they guarantee the behaviour you are looking for.

I would not really trust a comparison made via such an approach unless the serializer explicitly made this guarantee.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1

Depends a little on what you call the 'same' object.

But yes, 2 instances could compare as Equal=true and still produce different streams.

  • very trivially with a broken or limited override of Equals
  • because of subtle differences caused by normalizations or the order of operations.

I verified that adding the same elements to a Dictionary in a different order produces a different stream. I assume you would consider 2 dictionaries with the same content 'equal'.

H H
  • 263,252
  • 30
  • 330
  • 514
  • A badly written serializer may be able to do this as well? Imagine a serializer that outputs random data, in a field which is ignored during deserialization. – mah Oct 31 '11 at 13:27
  • 1
    @mah: possible but unlikely. But a serializer often generates IDs, no reason it has to do so in a repeatable manner. – H H Oct 31 '11 at 13:36
  • @mah Does any of the .NET built-in formatters behave even remotely like this? I'm aware you could do all manner of weird stuff if you write it yourself, but what about ones that come with .NET? – Branko Dimitrijevic Oct 31 '11 at 13:43