I've been working with the parsing module construct and have found myself really enamored with the declarative nature of it's data structures. For those of you unfamiliar with it, you can write Python code that essentially looks like what you're trying to parse by nesting objects when you instantiate.
Example:
ethernet = Struct("ethernet_header",
Bytes("destination", 6),
Bytes("source", 6),
Enum(UBInt16("type"),
IPv4 = 0x0800,
ARP = 0x0806,
RARP = 0x8035,
X25 = 0x0805,
IPX = 0x8137,
IPv6 = 0x86DD,
),
)
While construct doesn't really support storing values inside this structure (you can either parse an abstract Container into a bytestream or a bytestream into an abstract container), I want to extend the framework so that parsers can also store values as they parse through so that they can be accessed in dot notation ethernet.type.
But in doing so, think the best possible solution here would be a generic method of writing encoding/decoding mechanisms so that you could register an encode/decode mechanism and be able to produce a variety of outputs from the abstract data structure (the parser itself), as well as the output of the parser.
To give you an example, by default when you run an ethernet packed through the parser, you end up with something like a dict:
Container(name='ethernet_header',
destination='\x01\x02\x03\x04\x05\x06',
source='\x01\x02\x03\x04\x05\x06',
type=IPX
)
I don't want to have to parse things twice - I would ideally like to have the parser generate the 'target' object/strings/bytes in a configurable manner.
The root of the idea is that you could have various 'plugins' registered for consuming or processing structures so that you could programatically generate XML or Graphviz diagrams as well as being able to convert from bytes to Python dicts. The crux of the task is, walk a tree of nodes and based on the encoder/decoder, transform and return the transformed objects.
So the question is essentially - what patterns are best suited for this purpose?
codec style:
I've looked at the codecs module, which is fairly elegant in that you create encoding mechanisms, register that your class can encode things and you can specify the specific encoding you want on the fly.
'blah blah'.encode('utf8')
serdes (serializer, deserializer):
There's are a couple examples of existing serdes modules for Python, off the top of my head JSON springs to mind - but the problem with it is that it's very specific and doesn't support arbitrary formats easily. You can either encode or decode JSON and thats basically it. There's a variety of serdes constructed like this, some use the loads,*dumps* methods, some don't. It's a crapshoot.
objects = json.loads('{'a': 1, 'b': 2})
visitor pattern(?):
I'm not terribly familiar with the visitor pattern, but it does seem to have some mechanisms that might be applicable - the idea being that (if I understand this correctly), you'd setup a visitor for nodes and it'd walk the tree and apply some transformation (and return the new objects?).. I'm hazy here.
other?:
Are there other mechanisms that might be more pythonic or already be written? I considered perhaps using ElementTree and subclassing Elements - but I wanted to consult stackoverflow before doing something daft.