XML best practices: planning for extensibility

Question

I'm currently in the process of creating an XML format. I have big ideas for where I want this to go but in the beginning I would like to start small and allow for extensibility. I've done a lot of reading on the topic of XML attributes vs XML elements, when to use them, pros/cons of both, etc. The general consensus (see here and here) seems to be to use XML elements when possible unless you are absolutely sure a piece of data is atomic and would never need to be extended or truly is meta data about how to treat an element.

So here is my question. Say I follow this guidance and create basic XML document such as this.

<person>
     <name>John Doe</name>
</person>

I've followed what I thought was best practice. This is the most basic form which works for now and I placed the data in an element vs an attribute in case I want to extend it later. Now lets say I've been using this format for a while and I want to extend it. How do I do that without breaking any existing process which are expecting a full name in the innertext of the name element?

If I extended it like this.

<person>
    <name>John Doe
        <firstname>John</firstname>
        <lastname>Doe</lastname>
        <alias>John Doe</alias>
    </name>
</person>

It would break existing process, because now the innertext of 'name' is going to be "John DoeJohnDoeJohn Doe". I know there are ways to deal with this, but the point is to not break existing things that expect the innertext to contain the full name.

The only way I can think of easily extending this is to make the new values attributes of 'name'. But what if I wanted additional complexity. Like multiple 'alias' values. It wouldn't be possible with attributes.

<person>
     <name firstname="John" lastname="Doe" alias="John Doe">John Doe</name>
</person>

It seems like the only way to really extend this without breaking existing process would be to choose a new element name.

<person>
    <name>John Doe</name>
    <extendedname>
        <firstname>John</firstname>
        <lastname>Doe</lastname>
        <alias>John Doe</alias>
        <alias>Jon Doe</alias>
    </extendedname>
</person>

So this would work and solve my problem but i'm asking myself "why exactly was it important for 'name' to be an element instead of an attribute?" It kind of seems to me like in the end it didn't matter if 'name' was an attribute of 'person' or a child element with innertext because in the end I just had to use a new element name.

It occurs to me that a hybrid approach like this would be the most flexible and allow for maximum extensibility but I couldn't really find an example of someone doing this. If you started with this...

<person>
     <name value="John Doe" />>
</person>

It could easily turn into this without breaking any existing processes and still allows for even further extension.

<person>
    <name value="John Doe" />
        <firstname value="John" />
        <lastname value="Doe" />
        <alias value="John Doe" />
        <alias value="Jon Doe" />
    </name>
</person>

It kind of seems to me like the guidance should be use elements where possible and put the values inside some sort of 'value' attribute within the tag. And as always, common sense should apply and you should use innertext when its appropriate, like a message, memo, or notes field.

Am i failing to understand some critical design element in the first example that makes that a better approach? Has anyone had the experience of having to extend an XML schema while maintaining reverse compatibility and run into the same problems or solutions here? Any guidance or pro tips would be appreciated.

That is to say: Have a namespace for your version-1 content. If you make an incompatible change, use a new namespace. You can then contain content from both namespaces in a single document, if you want to generate something compatible with both versions, and/or define and document your own application logic for how mixed documents shall be handled. — Charles Duffy, Jul 15 '16 at 15:46
Yes, namespace, i've used them, but then you have to generate 2 documents right? 1 document in the old namespace, 1 in the new namespace. And at that point you haven't really extended anything have you, you just created a new format. — Lavaftw, Jul 15 '16 at 15:51
"Have to generate two documents"? That's only if you don't define your v2 parsers to still support v1 elements. — Charles Duffy, Jul 15 '16 at 15:52
...if you couldn't have mixed-namespace documents, that would rather defeat the point. — Charles Duffy, Jul 15 '16 at 15:53

score 1 · Answer 1 · answered Jul 15 '16 at 15:51

1

Assuming that you build your parsers to recognize prior releases' container elements, you can do this:

<doc xmlns:v1="http://example.com/yourformat/1.0"
     xmlns:v2="http://example.com/yourformat/2.0">
  <v1:person>
    <v1:name>John Doe</v1:name>
    <v2:name>
      <v2:firstname>John</v2:firstname>
      <v2:lastname>Doe</v2:lastname>
      <v2:alias>John Doe</v2:alias>
      <v2:alias>Jon Doe</v2:alias>
    </v2:name>
  </v1:person>
</doc>

Obviously, make v2 the default xmlns if you don't like all the prefixes.

This way, even with v2 content added, your document is still perfectly valid to a v1 parser.

answered Jul 15 '16 at 15:51

Charles Duffy

280,126
43
390
441

Yes, this would work though perhaps a little more background is required. I'll be gathering data from servers and then writing it to an XML file, which then gets pulled off, stored, and ingested into a database for additional reporting. There are many teams and individuals involved, with varying levels of skills. I can deal with namespace, but changes are good someone is just going to grab the data, do a $Report = [xml]Get-Content report.xml, and parse through it, without any regard for namespaces. Hence, my desire to keep 1 name space and just plan for it to be extendable. – Lavaftw Jul 15 '16 at 15:56
At the risk of flippancy: If you want a simple, dumb format anyone can use without paying much attention to what they're doing, might I direct you to JSON? – Charles Duffy Jul 15 '16 at 15:57
So i concede the point. XML namespaces is the right way to do this, but my question was really in regards to the general guidance that one should use elements when possible because they are more extendable. If you cannot extend an element with inner text without creating a new namespace, its really no different than if you used an attribute and then decided it should be a complex element that requires a new namespace. Anyways, I do appreciate your insight. – Lavaftw Jul 15 '16 at 16:50

XML best practices: planning for extensibility

1 Answers1