1

I have to parse XML files that are in the root element a xs:choise. Some of the element types are hex encoded with little endian order, others are big endian order. Using a schema I can define different types for these.

I am trying to use the xmlschema package, where I can use the value_hook method to alter element value parsing. The value_hook is a callback that receives the value from the XML file, as well as the XSD type. This allows me to convert the hex values into int with the correct endianness.

The problem I have is that the xmlschema decode function returns the parsed XML file where dictionaries are used to represent the structure. This does not preserve the order of the elements in the root element.

My example XML file:

<?xml version="1.0" encoding="UTF-8"?>

<A>
    <B>b1</B>
    <C>c</C>
    <B>b2</B>
</A>

is parsed into this: {'B': ['b1', 'b2'], 'C': ['c']} the sequence of B,C,B is lost.

I need something similar to elementTree, where using the get_children() I can iterate on all children, combined with the value_hook feature of xmlschema, or similar access to the type defined in the XSD file.

Thanks for any info.

Python code:

from xmlschema import XMLSchema

def parsing_value_hook(value, xsd):
    print(f'parsing hook: {value}, {xsd.name}')
    return value

test_prefix = 'test'
xml_file = test_prefix + '.xml'
xsd_file = test_prefix + '.xsd'

xml_schema = XMLSchema(xsd_file)
parsed= xml_schema.decode(xml_file, value_hook=parsing_value_hook)

print(parsed)
PaulCC
  • 11
  • 1
  • The [`lxml`](https://lxml.de/) module includes [xml schema support](https://lxml.de/validation.html). Maybe this would get you what you want (ElementTree + validation)? – larsks Jul 20 '23 at 19:37
  • What's your schema look like? `xs:sequence` demands that order be preserved - does it use that? – bbayles Jul 20 '23 at 19:58
  • @bbayles My test XSD uses a xs:choice. The issue I am having is not the XSD based validation and enforcing or relaxing the order. My problem is with xmlschema uses dictionary to represent the parsed structure, so all elements of the same type will be under a single key, and order of elements with different types is lost. – PaulCC Jul 20 '23 at 20:04
  • @larsks I have been working with lxml on this project so far. I do have the validation working. What I am missing with lxml is a way to get for parsed values to the XSD type for the element. lxml parses hexBinary as a Python str. If I could get the corresponding XSD type, like the value_hook of xmlschema does, that would work for me. It could be that I will have to stay with lxml, and add the matching of parsed value to the type myself. – PaulCC Jul 20 '23 at 20:17
  • https://github.com/sissaschool/xmlschema/issues/69 describes a similar issue. https://github.com/sissaschool/xmlschema/pull/117 is aimed at fixing it, but the examples show that your expected behavior depends on the schema defining the order of tags explicitly. – bbayles Jul 20 '23 at 21:17

0 Answers0