50

I am normally writing all parts of the code in C# and when writing protocols that are serialized I use FastSerializer that serializes/deserializes the classes fast and efficient. It is also very easy to use, and fairly straight-forward to do "versioning", ie to handle different versions of the serialization. The thing I normally use, looks like this:

public override void DeserializeOwnedData(SerializationReader reader, object context)
{
    base.DeserializeOwnedData(reader, context);
    byte serializeVersion = reader.ReadByte(); // used to keep what version we are using

    this.CustomerNumber = reader.ReadString();
    this.HomeAddress = reader.ReadString();
    this.ZipCode = reader.ReadString();
    this.HomeCity = reader.ReadString();
    if (serializeVersion > 0)
        this.HomeAddressObj = reader.ReadUInt32();
    if (serializeVersion > 1)
        this.County = reader.ReadString();
    if (serializeVersion > 2)
        this.Muni = reader.ReadString();
    if (serializeVersion > 3)
        this._AvailableCustomers = reader.ReadList<uint>();
}

and

public override void SerializeOwnedData(SerializationWriter writer, object context)
{            
    base.SerializeOwnedData(writer, context);
    byte serializeVersion = 4; 
    writer.Write(serializeVersion);


    writer.Write(CustomerNumber);
    writer.Write(PopulationRegistryNumber);            
    writer.Write(HomeAddress);
    writer.Write(ZipCode);
    writer.Write(HomeCity);
    if (CustomerCards == null)
        CustomerCards = new List<uint>();            
    writer.Write(CustomerCards);
    writer.Write(HomeAddressObj);

    writer.Write(County);

    // v 2
    writer.Write(Muni);

    // v 4
    if (_AvailableCustomers == null)
        _AvailableCustomers = new List<uint>();
    writer.Write(_AvailableCustomers);
}

So its easy to add new things, or change the serialization completely if one chooses to.

However, I now want to use JSON for reasons not relevant right here =) I am currently using DataContractJsonSerializer and I am now looking for a way to have the same flexibility I have using the FastSerializer above.

So the question is; what is the best way to create a JSON protocol/serialization and to be able to detail the serialization as above, so that I do not break the serialization just because another machine hasn't yet updated their version?

Ted
  • 19,727
  • 35
  • 96
  • 154

4 Answers4

47

The key to versioning JSON is to always add new properties, and never remove or rename existing properties. This is similar to how protocol buffers handle versioning.

For example, if you started with the following JSON:

{
  "version": "1.0",
  "foo": true
}

And you want to rename the "foo" property to "bar", don't just rename it. Instead, add a new property:

{
  "version": "1.1",
  "foo": true,
  "bar": true
}

Since you are never removing properties, clients based on older versions will continue to work. The downside of this method is that the JSON can get bloated over time, and you have to continue maintaining old properties.

It is also important to clearly define your "edge" cases to your clients. Suppose you have an array property called "fooList". The "fooList" property could take on the following possible values: does not exist/undefined (the property is not physically present in the JSON object, or it exists and is set to "undefined"), null, empty list or a list with one or more values. It is important that clients understand how to behave, especially in the undefined/null/empty cases.

I would also recommend reading up on how semantic versioning works. If you introduce a semantic versioning scheme to your version numbers, then backwards compatible changes can be made on a minor version boundary, while breaking changes can be made on a major version boundary (both clients and servers would have to agree on the same major version). While this isn't a property of the JSON itself, this is useful for communicating the types of changes a client should expect when the version changes.

monsur
  • 45,581
  • 16
  • 101
  • 95
  • so is it forbidden to remove properties in JSON node? what if my class does remove some variable? – Marson Mao Jan 22 '15 at 03:34
  • JSON does not forbid you from removing properties. BUT, if a client consumes that JSON, and a property suddenly disappears, that client may break. The goal of a versioning strategy is to allow an API to evolve while still keeping clients stable. – monsur Jan 22 '15 at 16:52
16

Google's java based gson library has an excellent versioning support for json. It could prove a very handy if you are thinking going java way.

There is nice and easy tutorial here.

shashankaholic
  • 4,122
  • 3
  • 25
  • 28
  • 1
    Im writing in C#, but the implementation should be possible in any language, otherwise it sort of misses the whole point... – Ted Apr 06 '12 at 11:40
9

It doesn't matter what serializing protocol you use, the techniques to version APIs are generally the same.

Generally you need:

  1. a way for the consumer to communicate to the producer the API version it accepts (though this is not always possible)
  2. a way for the producer to embed versioning information to the serialized data
  3. a backward compatible strategy to handle unknown fields

In a web API, generally the API version that the consumer accepts is embedded in the Accept header (e.g. Accept: application/vnd.myapp-v1+json application/vnd.myapp-v2+json means the consumer can handle either version 1 and version 2 of your API) or less commonly in the URL (e.g. https://api.twitter.com/1/statuses/user_timeline.json). This is generally used for major versions (i.e. backward incompatible changes). If the server and the client does not have a matching Accept header, then the communication fails (or proceeds in best-effort basis or fallback to a default baseline protocol, depending on the nature of the application).

The producer then generates a serialized data in one of the requested version, then embed this version info into the serialized data (e.g. as a field named version). The consumer should use the version information embedded in the data to determine how to parse the serialized data. The version information in the data should also contain minor version (i.e. for backward compatible changes), generally consumers should be able to ignore the minor version information and still process the data correctly although understanding the minor version may allow the client to make additional assumptions about how the data should be processed.

A common strategy to handle unknown fields is like how HTML and CSS are parsed. When the consumer sees an unknown fields they should ignore it, and when the data is missing a field that the client is expecting, it should use a default value; depending on the nature of the communication, you may also want to specify some fields that are mandatory (i.e. missing fields is considered fatal error). Fields added within minor versions should always be optional field; minor version can add optional fields or change fields semantic as long as it's backward compatible, while major version can delete fields or add mandatory fields or change fields semantic in a backward incompatible manner.

In an extensible serialization format (like JSON or XML), the data should be self-descriptive, in other words, the field names should always be stored together with the data; you should not rely on the specific data being available on specific positions.

Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
6

Don't use DataContractJsonSerializer, as the name says, the objects that are processed through this class will have to:

a) Be marked with [DataContract] and [DataMember] attributes.

b) Be strictly compliant with the defined "Contract" that is, nothing less and nothing more that it is defined. Any extra or missing [DataMember] will make the deserialization to throw an exception.

If you want to be flexible enough, then use the JavaScriptSerializer if you want to go for the cheap option... or use this library:

http://json.codeplex.com/

This will give you enough control over your JSON serialization.

Imagine you have an object in its early days.

public class Customer
{ 
    public string Name;

    public string LastName;
}

Once serialized it will look like this:

{ Name: "John", LastName: "Doe" }

If you change your object definition to add / remove fields. The deserialization will occur smoothly if you use, for example, JavaScriptSerializer.

public class Customer
{ 
    public string Name;

    public string LastName;

    public int Age;
}

If yo try to de-serialize the last json to this new class, no error will be thrown. The thing is that your new fields will be set to their defaults. In this example: "Age" will be set to zero.

You can include, in your own conventions, a field present in all your objects, that contains the version number. In this case you can tell the difference between an empty field or a version inconsistence.

So lets say: You have your class Customer v1 serialized:

{ Version: 1, LastName: "Doe", Name: "John" }

You want to deserialize into a Customer v2 instance, you will have:

{ Version: 1, LastName: "Doe", Name: "John", Age: 0}

You can somehow, detect what fields in your object are somehow reliable and what's not. In this case you know that your v2 object instance is coming from a v1 object instance, so the field Age should not be trusted.

I have in mind that you should use also a custom attribute, e.g. "MinVersion", and mark each field with the minimum supported version number, so you get something like this:

public class Customer
{ 
    [MinVersion(1)]
    public int Version;

    [MinVersion(1)]
    public string Name;

    [MinVersion(1)]
    public string LastName;

    [MinVersion(2)]
    public int Age;
}

Then later you can access this meta-data and do whatever you might need with that.

Adrian Salazar
  • 5,279
  • 34
  • 51
  • Thanks for the tip. However, I dont see any specific version handling there, except for the very bloated "ShouldSerialize"-approach. Is that what u are thinking of? – Ted Dec 18 '12 at 13:39
  • @Ted, I am just saying that (a) these serializers are flexible enough so they can handle any input. (b) if you have such flexibility, dealing with versioning is less critical. Check my edit. – Adrian Salazar Dec 19 '12 at 20:56
  • The problem is that Attributes on the C# is fine, but this is read by a JAVA-implementation in Android, where they dont have Attributes. – Ted Dec 20 '12 at 11:16
  • @Ted, plain json will travel to your Android program, no attributes at all. So no attributes will be read – Adrian Salazar Dec 20 '12 at 11:37
  • Regarding using of `DataContractJsonSerializer` with .NET 4.8: **a) is wrong**, you can serialize with just `[Serializable]` and all props and fields will be serialized; **b) is wrong**, adding or removing `[DataMember]` attribute does not disturb deserialization, only type change will do it. – Rekshino Jul 07 '21 at 10:21