2

Can somebody point me to the right direction as I'm unable to generate binding classes with PyXB when element names are non ASCII?

The minimal reproducible example:

<?xml version="1.0" encoding="utf8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Address">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Country" type="xs:string" />
        <xs:element name="Street" type="xs:string" />
        <xs:element name="Town" type="xs:string" />       
        <xs:element name="Дом" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

(look for the <xs:element name="Дом" type="xs:string" /> where I use cyrillic. The encoding of the file is utf8. However, when I try:

pyxbgen -u example.xsd -m example

I got the error:

Traceback (most recent call last):
  File "/home/sergey/anaconda3/lib/python3.5/xml/sax/expatreader.py", line 210, in feed
    self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 26

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sergey/anaconda3/bin/pyxbgen", line 52, in <module>
    generator.resolveExternalSchema()
.......

which points to the cyrillic name of the element. What am I missing?

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72

1 Answers1

2

UTF8 is spelled "utf-8" in XML and in Python.

lilith[33]$ head -1 /tmp/cyr.xsd 
<?xml version="1.0" encoding="utf-8"?>
lilith[34]$ pyxbgen -u /tmp/cyr.xsd -m cyr
WARNING:pyxb.binding.generate:Element use None.Дом renamed to emptyString
Python for AbsentNamespace0 requires 1 modules

That PyXB generates an element named emptystring instead of one named Дом is problem, though. PyXB was designed long before Python 3 and unicode support, and it goes to great effort to convert text to valid Python 2 identifiers.

Since you're using Python 3 it should be possible to bypass that conversion, but it's not quite trivial. Track issue 67, or if there's a Cyrillic transliteration you prefer the technique demonstrated here for Japanese might work.

pabigot
  • 989
  • 7
  • 8