1

When reading XML, should I try to model Objects (in OO sense) or leave the content as XML entities?

I am trying to decide between two approaches when reading "strongly typed" (schema-based) XML content using Object Oriented languages:

  1. At first, I would create a typed class hierarchy to represent every possible element type of that schema with its typed properties and all. Then, when I parse a document, I would recursively scan every node and create a proper class instance, reproducing all the nesting, attributes, children (as collections), etc. Then, I could manipulate this object tree as I want in my application, and when saving back I would have to call the object(s)' toXml() method, or somehow "convert" the object back to XML format.

  2. Using some off-the shelf XML library (any high-level language have one or more), I would parse the document and have its structure already in memory. That would mean a tree of Nodes. I would then manipulate them directly and could save everything back to file using the library methods. Also, if my application needs representation of the data, I could create proxy objects whose properties and methods are actually referring to the underlying Node structure.

The questions are: how is it usually done? Is there a "right" way to map between XML and OO and back? Is there a well-known way XML is supposed to be used by OO, or a way OO is supposed to use XML?

heltonbiker
  • 26,657
  • 28
  • 137
  • 252

2 Answers2

1

The first approach makes sense if you are using XML as a serialization format for your objects. In this case, the XML will contain all information to create your object structure. There should be a one-to-one mapping between objects and XML nodes and every object should be responsible for converting iself to and from DOM data.

The second approach is better if your objects only require partial information from the XML. Another use case would be a legacy XML schema, maybe with design flaws, that you do not want to directly map to your object structure.

By the way, for both approaches, I stringly recommend to use an off-the-shelf XML library. I am not sure why you think that this can only be done for the second one.

Frank Puffer
  • 8,135
  • 2
  • 20
  • 45
  • Thanks for your answer. Well, although I have not written it in the question, I would use the libraries in the first approach, too, but what I meant as a difference between approaches would be to assemble actual appliation objects from lower-level xml-parser calls (SAX like traversal), and delegate the object-tree management to the application, versus leave the underlying object structure representation to the higher-level, single-pass "parse" result from the library. – heltonbiker Apr 09 '16 at 15:58
1

Your #1 is reinventing JAXB. Your #2 is reinventing DOM.

Use #1 when you wish to operate against an OO representation of your domain.

Use #2 when you wish to operate against an XML representation of your domain. Consider also as alternatives to #2:

  • event-based XML parsers
  • XSLT-based XML transformations

Take care not to slide into #1 due only to familiarity. True justifications for #1 should be based on a legitimate need for an OO design to be the centerpiece of an architecture. XML-to-XML mappings, even complex ones, need never go through an intermediary OO representation unless substantial processing has to happen in the OO realm. Pure XML-to-XML transformations can be handled elegantly in XSLT alone.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Very nice, thanks! I am actually interested in the KML file format and architecture. The application would be a map-editing one, where I would display and edit geometries, reorganize elements inside a tree-view, split and merge files. Very similar to Google MyMaps. I believe an object model is'n actually needed, since KML is already quite comprehensive in its data types. I would surely need, though, to have "view objects" in a View-Model style architecture, but nothing prevents to use the XML model directly as the model, implementing the View using GUI toolkit primitives. What do you think? – heltonbiker Apr 09 '16 at 16:07
  • Also, if you don't mind, I was hooked by your opposition between "OO representation" and "XML representation" as two almost mutually exclusive ways to represent a domain. Would you recommend any resource so that I can study these two design options at a more abstract/conceptual level? I believe this is the true problem hidden in my question, and I would say that's what I would have to solve first, everything else would follow. – heltonbiker Apr 09 '16 at 16:09
  • (excuse me but I have one more question:) When you say that "event-based parsers are an alternative to XML representation of your domain", I can only think of "SAX vs DOM". I believe DOM is a perfect way to create an in-memory, editable version of a xml file, and as such suitable for representing a domain structure. Also, in my readings, SAX and DOM are presented as quite the opposite from one another. If you care enough to elaborate on that, I would appreciate a lot! :o) – heltonbiker Apr 09 '16 at 16:16
  • KML: Yeah, probably share the model but code outside XML. OO vs XML: Favor OO for record-oriented data and XML for document-oriented data. [SAX vs DOM](http://stackoverflow.com/questions/6828703/what-is-the-difference-between-sax-and-dom) is well understood -- just meant to extend your consideration to other XML processing models beyond DOM. Good luck. – kjhughes Apr 09 '16 at 20:19