0

I have XML files following an XSD and I need to transform them into JSON.

The files are typically like this

example.xml :

<object name="foo">
  <values>one</values>
  <values>two</values>
  <values>three</values>
  <param attr="2" value="true" />
</object>

Which translate into JSON to this

{
  "name" : "foo",
  "values" : [
    "one",
    "two",
    "three"
  ],
  "param" : {
    "attr" : "2",
    "value" : "true"
  }
}

This is almost fine, except that I would like the data to be typed, so that param becomes :

  "param" : {
    "attr" : 2,
    "value" : true
  }

The XML files reference an XSD schema that defines the data type for each element or attribute, such as

<xs:attribute name="attr" type="xs:integer"

The XML to JSON transformation is done using XML::Simple to read the XML into a Perl hash and the JSON module is used to encode into JSON.

How could I do the same job but using the definitions from the XSD Schema to load the XML with the right type for each field?

I need to use the XSD because it may happen that text field are made of only numbers.

Community
  • 1
  • 1
Blake_ch
  • 13
  • 3
  • 2
    ***You can't do this,*** and untyped data is the least of your worries. In general, an XML document has no equivalent JSON string. You would need to do a lot of checking to make sure that there is no loss of information if you try to do this. Why do you think this is necessary? XML is as portable as JSON and more, and there is an XML library for the majority of popular programming languages. – Borodin Dec 05 '16 at 16:32
  • 3
    The [documentation for `XML::Simple`](https://metacpan.org/pod/XML::Simple) has this. *"You really don't want to use this module in new code"* and *"The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces"*. `XML::Simple` is a very long way from being able to take account of an XSD schema. – Borodin Dec 05 '16 at 16:39
  • 3
    You'll probably end up having to navigate the XML and the XSD in parallel. If that's the case, this is a lot of work and far beyond the scope of SO. It's probably faster to create a non-generic solution (i.e. one that doesn't actually read the XSD). – ikegami Dec 05 '16 at 16:54
  • Thanks for your input. This right. For some reasons I need to keep both formats, so the way to go will be developping my own tool. – Blake_ch Dec 07 '16 at 11:50

1 Answers1

0

Well, the summary answer is - you can't do what you're trying to do, the way you're trying to do it.

XML is a 'deeper' structure than JSON, in that it has style sheets and attributes, so inherently you'll be discarding data in the transformation process. How much is acceptable to discard is always going to be a case by case sort of thing.

More importantly - both XML and JSON are designed with similar use cases - to be machine readable/parsable. Almost everything that can 'read' JSON programatically can also read XML because libraries for both are generally available.

And most importantly of all - Don't use XML::Simple as it's not "Simple" it's for "Simple" XML. Which yours isn't. Both XML::Twig and XML::LibXML are much better tools for almost any XML parsing job.

So really - what you need to do is backtrack a bit, and explain what you're trying to accomplish and why.

Failing that though - I would probably try a simplistic 'type test' within perl, using regex to detect if something is 'just' numeric, or 'just' boolean, and treat everything else as string.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101