3

I have to specify a JSON data structure; that data structure will be part of an interface description, the data will be processed by JavaScript. JSON is set for the data transmission. In other projects, where we used XML instead of JSON, I have used rich XML schemas for this. Unfortunately, I cannot do that now.

I did some researching and found JSON Schema. However, this is still draft status, which makes me feel a bit uneasy to use it in this context.

I also came across this question discussing how to map XML to JSON. There seems to be a standard (?) conversion in the XML class in the org.json namespace. It appears that the conversion is rather straight-forward for XML documents without mixed content.

So the idea is to use XML Schema to describe the data structure, use our existing XML processing (editing, transformation, validation, ...) tools as long as possible on the server side and convert the XML DOM to JSON just before delivering the data to the JSON consumer.

Data transmission is one-way only and we would not have mixed-content XML.

Maybe someone has tried this before? Would that be a practical approach in the sense that the the semantics of the XML Schema are still clear enough for the client-side programmers when (conceptually) applied to the JSON document? Are there any particular pitfalls to be aware of?

Community
  • 1
  • 1
Fabian
  • 2,822
  • 1
  • 17
  • 22

2 Answers2

2

If I understood your idea right, you want to use XML Schema as the primary model for you data exchange - for XML as well as JSON formats.

This idea has two parts:

  • Use single source to model all the data exchange.
  • Use XML Schema as this single source.

Singe source model

The first idea brings you to MDD (Model-Driven Development) or MDA (Model-Driven Architecture) which had a hype around 2002-2005. It was UML-heavy, vendor-driven hype, but quite a few reasonable things (like AndroMDA) survived.

Generally, MDA is a good idea. It works splendid as long as you do "standard" things. But it can be a nightmare if you want to "customize".

In your case, I would definitely say that single-source model makes sense. This is about data exchange. In the core this can be reduced to very simple models which are still powerful enough to express everything you need.

JSON is an example for this. JSON is even simpler that XML but still powerful enough. It clearly shows that as long as you have basic primitive types, objects, arrays and nesting you can express almost anything.

This "single source model" must not be necessarily UML, it can be anything powerful enough to cover all the underlying requirements.

The main problem with "single source model" is customizing. You know, 90% works verwy well OOTB, but then in 10% you don't get the result you want and have to customize and then the effort gets you. Most of the generation tools have some kinds of "plugins". So if you fit in the 90%, you're lucky, otherwise you may need to get to know the hairy internals of the genration tools.

To sum up, single-source model is a good idea as long as it serves all the needs AND the effort to adapt/apply it for the required scenarios is not greater that making it from scratch.

XML Schema as the model

The next question is whether XML Schema is good as the single source model.

You have probably heard or used JAXB which has a schema compiler (XJC). This compiler can take your XML Schema and then generate Java classes with JAXB annotations. These classes can then be used to unmarshal XML into Java objects or marshal these object to XML.

And to JSON:

JAXB Mapping to JSON

Looks like you can also produce a JSON Schema from these classes (haven't tried it myself though):

How to generate JSON schema from a JAXB annotated class?

So XML Schema-first approach works. You can call it schema-driven development (I, hereby, claim the copyright on this term).

I personally did a lot of things schema-first wrote a number of tools/plugins for XJC. For instance:

  • Hyperjaxb makes schema-derived classes persistable with JPA.
  • Jsonix is baiscally a JAXB port for pure JavaScript.

My experience is that you can do a lot of things schema-first, but I also have to say that XML Schema is good but not the best or simplest model. The specification is complex, and if you take a look at the schema-derived classes then you could spot a few constructs which don't fit well in Java beans and properties. For instance, @XmlElementRef is a complex and often weird looking construct - which is stil necessary to cover quite a number of cases you can easily express in the XML Schema. In all the tools I wrote i alsways had to fight with cases and corder cases and corner cases of corner cases of such constructs.

XML Schema, if you keep it simple and neat, may be beautiful. Maps perfect to beans and properties, easy to understand and work with, a lot of tool support. So XML Schema is not the worst choice to model or specify data exchange.

But it can also get as complex as hell. I saw a lot of overengineered schemas, which then are extremely hard to work with - for a very little gain. Sometimes schema designers just don't know XML Schema well enough, sometimes know it too well. Last time I helped to work out "XML Schema design best practices", we landed on 60+ someting pages document of do's and don't's. So it's easy to get XML Schemas wrong.

But still, as I said above, if it's kept simple and clean it may be beuatiful.

What are the alternatives?

Well, you may actually use your Java code as your model source. Annotated POJOs are expressionaly powerful and versatile enough, but still quite simple to work with. You are not schema-first, you're Java code-first then, but you still can do all the same tricks. You can generate an XML Schema based on your annotated classes. You can do persistence (and much more) with MOXy. You can do JSON just as well.

To sum up and answer your question:

  • Yes, it is practical, and is known to work fairly well.
  • Along with the schema-first approach also consider Java-first approach.
  • You have tools to get XML-Objects-JSON-Persistence.
  • There are pitfalls (see above).

Hope this helps.

Community
  • 1
  • 1
lexicore
  • 42,748
  • 17
  • 132
  • 221
0

Since no one has answered to this question so far and we have started to follow this approach, I quickly summarize that for us the approach works generally quite well. We have designed a very rich XML Schema, that serves us as part of the contract between the server and the web client. The JSON follows the XML one-to-one, so the XML Schema reads naturally for the JSON document, too.

The only minor problem we noticed is that the canonical XML-to-JSON transformation that we use (which is not Schema-aware) creates a single object when there is just one child element somewhere in the tree, even when the XML Schema has an upperBound of 'many' for that element. This means that the programmers have to handle some polymorphism between object-values and collections here on the JSON side.

Fabian
  • 2,822
  • 1
  • 17
  • 22