I would create a data binding code generator for a specified programming language and for a specified serialization format: given a specification for the structure of data to be serialized or deserialized, the intended code generator should generate the classes (in the specified programming language) that represent the given vocabulary as well as the methods for serialization and deserialization using the specified format. The intended code generator could require the following inputs:
- the target programming language, that is the programming language for generating the code;
- the target serialization format, that is the serialization format for the data;
- the specification of the structure of data to be serialized or deserialized.
Since initially I would like to create a simple code generator, the first version of this software could require only define the specification of the structure of data to be serialized or deserialized, so I choose C# as target programming language and XML as target serialization format. Essentially, the intended code generator should be a Java software which reads the specification of the structure of data to be serialized or deserialized (this specification must be written in according to a given grammar), and generates the C# classes that represent the given vocabulary: these classes should have the methods for serialization and deserialization in XML format. The purpose of the intended code generator is to generate one or more classes, so that they could be embedded in a C# project.
Regarding the specification of the structure of data to be serialized or deserialized, it could be defined as in the following example:
simple type Message: int id, string content
Given the specification in the above example, the intended code generator could generate the following C# class:
public class Message
{
public int Id { get; set; }
public string Content { get; set; }
public byte[] Serialize()
{
// ...
}
public void Deserialize(byte[] data)
{
// ...
}
}
I read about ANTLR and I believe that this tool is perfect for the just explained purpose. As explained in this answer, I should first create a grammar for the specification of the structure of data to be serialized or deserialized.
The above example is very simple, because it defines only a simple type, but the specification of the structure of data could be more complex, so we could have a compound type which includes one or more simple types, or lists, etc., like in the following example:
simple type LogInfo: DateTime time, String message
simple type LogSource: String class, String version
compound type LogEntry: LogInfo info, LogSource source
Moreover, the specification of the data could include also one or more constraints, like in the following example:
simple type Message: int id (constraint: not negative), string content
In this case, the intended code generator could generate the following C# class:
public class Message
{
private int _id;
private string _content;
public int Id
{
get { return _id; }
set
{
if (value < 0)
throw new ArgumentException("...");
_id = value;
}
}
public string Content
{
get { return _content; }
set { _content = value; }
}
public byte[] Serialize()
{
// ...
}
public void Deserialize(byte[] data)
{
// ...
}
}
Essentially, the intended code generator should find all user-defined types, any constraints, etc .. Is there some simple example?