8

I would like to convert XML into JSON (concretely, a OAI-PMH response). I am currently using node.js xml2js, but the issue is that JSON is very verbose, with way to many levels of nesting and arrays, even when there is only one element as a child and will never be more than one. The issue is that xml2js does not know anything about the schema of the XML file, so it has to be conservative.

My question is, is there any other (preferably JavaScript) code which would use XML Schema to guide conversion process? So if schema defines types and structure of XML, that than JSON takes advantage of this and have correct types automatically, and not unnecessary array levels.

Mitar
  • 6,756
  • 5
  • 54
  • 86

6 Answers6

1

I had a similar, but opposite, problem with X2JS : it would not create a list if there was only one child element ( even when it should be a list ). The solution ( for me ) was to supply the extra options "arrayFormPaths" to the converter -- this causes matching elements to become an array, even if there is only one element.

I agree that there should be a way to do this using an XML Schema...but I couldn't find anything either.

Nick Perkins
  • 8,034
  • 7
  • 40
  • 40
  • @Mitar, that looks promising. Any plans to update it? I just tried installing: "found 13 vulnerabilities (1 low, 10 moderate, 2 high)" – JohnK Jul 23 '19 at 16:00
  • Where can I find your referenced X2JS library? It seems distinct from the OP's xml2js, as I can not find your "arrayFormPaths" option there. Is it this one here? https://github.com/abdolence/x2js – Marcel Aug 21 '21 at 16:04
1

At the end, I decided just to implement such a package: xml4js.

So for XML taken from XML Primer:

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20" xmlns="http://www.example.com/PO">
   <shipTo country="US">
      <name>Alice Smith</name>
      <street>123 Maple Street</street>
      <city>Mill Valley</city>
      <state>CA</state>
      <zip>90952</zip>
   </shipTo>
   <billTo country="US">
      <name>Robert Smith</name>
      <street>8 Oak Avenue</street>
      <city>Old Town</city>
      <state>PA</state>
      <zip>95819</zip>
   </billTo>
   <comment>Hurry, my lawn is going wild!</comment>
   <items>
      <item partNum="872-AA">
         <productName>Lawnmower</productName>
         <quantity>1</quantity>
         <USPrice>148.95</USPrice>
         <comment>Confirm this is electric</comment>
      </item>
      <item partNum="926-AA">
         <productName>Baby Monitor</productName>
         <quantity>1</quantity>
         <USPrice>39.98</USPrice>
         <shipDate>1999-05-21</shipDate>
      </item>
   </items>
</purchaseOrder>

Without using a XML Schema to guide a conversion process, with explicit arrays turned on, you would get:

{
  "purchaseOrder": {
    "$": {
      "orderDate": "1999-10-20",
      "xmlns": "http://www.example.com/PO"
    },
    "shipTo": [
      {
        "$": {
          "country": "US"
        },
        "name": [
          "Alice Smith"
        ],
        "street": [
          "123 Maple Street"
        ],
        "city": [
          "Mill Valley"
        ],
        "state": [
          "CA"
        ],
        "zip": [
          "90952"
        ]
      }
    ],
    "billTo": [
      {
        "$": {
          "country": "US"
        },
        "name": [
          "Robert Smith"
        ],
        "street": [
          "8 Oak Avenue"
        ],
        "city": [
          "Old Town"
        ],
        "state": [
          "PA"
        ],
        "zip": [
          "95819"
        ]
      }
    ],
    "comment": [
      "Hurry, my lawn is going wild!"
    ],
    "items": [
      {
        "item": [
          {
            "$": {
              "partNum": "872-AA"
            },
            "productName": [
              "Lawnmower"
            ],
            "quantity": [
              "1"
            ],
            "USPrice": [
              "148.95"
            ],
            "comment": [
              "Confirm this is electric"
            ]
          },
          {
            "$": {
              "partNum": "926-AA"
            },
            "productName": [
              "Baby Monitor"
            ],
            "quantity": [
              "1"
            ],
            "USPrice": [
              "39.98"
            ],
            "shipDate": [
              "1999-05-21"
            ]
          }
        ]
      }
    ]
  }
}

But the package gives you this:

{
  "purchaseOrder": {
    "$": {
      "orderDate": "1999-10-20T00:00:00.000Z"
    },
    "shipTo": {
      "$": {
        "country": "US"
      },
      "name": "Alice Smith",
      "street": "123 Maple Street",
      "city": "Mill Valley",
      "state": "CA",
      "zip": 90952
    },
    "billTo": {
      "$": {
        "country": "US"
      },
      "name": "Robert Smith",
      "street": "8 Oak Avenue",
      "city": "Old Town",
      "state": "PA",
      "zip": 95819
    },
    "comment": "Hurry, my lawn is going wild!",
    "items": {
      "item": [
        {
          "$": {
            "partNum": "872-AA"
          },
          "productName": "Lawnmower",
          "quantity": 1,
          "USPrice": 148.95,
          "comment": "Confirm this is electric"
        },
        {
          "$": {
            "partNum": "926-AA"
          },
          "productName": "Baby Monitor",
          "quantity": 1,
          "USPrice": 39.98,
          "shipDate": "1999-05-21T00:00:00.000Z"
        }
      ]
    }
  }
}
Mitar
  • 6,756
  • 5
  • 54
  • 86
  • The example which is attached are not working as expected.https://github.com/peerlibrary/node-xml4js Any help, would be greatly helpful – sharath chandra Feb 04 '21 at 16:45
0

Found the following project library on GitHub that does the job : Schema Aware XML to JSON translator .

This library does the job in java. It is still not fully mature but does the conversion using the Schema provided.

goodman
  • 23
  • 7
0

There is a very well maintained python package named xmlschema that would do that without any hassle.

Using to_dict method.

import xmlschema
import simplejson as json

my_schema = xmlschema.XMLSchema("../shiporder.xsd")
obj = my_schema.to_dict('../shiporder.xml')
# in case you want to save the json in a file
json.dumps(obj) 

You can get the example xml and xsd for ship order from w3schools. Once downloaded try to leave only one item in the ship order xml and see how the schema retains the array type. The number types are also preserved after conversion to json.

Wildhammer
  • 2,017
  • 1
  • 27
  • 33
  • I tried this approach. It seems `xmlschema` or `simplejson` have a dependency on `Counter` which is only available in Python 2.7. However `xmlschema` won't install on Python 2.7. – user358041 Aug 07 '23 at 17:24
-1

If you are looking for a java solution.

jschnasse
  • 8,526
  • 6
  • 32
  • 72
-5

An XML schema is used for XML validation. It is not possible to convert anything with a XML schema. A schema defines the structure of an XML document. If you want to convert something you need a XSLT which is a XML transformation specification. See here how to do it in JavaScript: http://www.w3schools.com/xsl/xsl_client.asp

ceving
  • 21,900
  • 13
  • 104
  • 178
  • 3
    I want to convert structure of XML document into JSON. And of course it can help. For example, `maxOccurs` can help you know if you should create an array or not (if `maxOccurs` is 1, you do not need an array). And `type` can help you convert to JSON types. – Mitar Jun 13 '14 at 17:40
  • 1
    @Mitar You can convert a XML schema to JSON, but this also requires XSLT. A schema does not convert anything. – ceving Jun 14 '14 at 17:33
  • I am saying that schema could be used to help in the conversion process. – Mitar Jun 15 '14 at 01:39
  • @Mitar Yes a text editor could also help the transformation process. – ceving Jun 15 '14 at 19:02
  • https://www.npmjs.com/package/xml4js "Package builds upon node-xml2js, detects and parses XML Schema which is then used to transform JavaScript object into a consistent schema-driven structure." – gvlx Sep 09 '15 at 14:01