1

I'm a total noob in coding, I study IT, and have a school project in which I must convert a .txt file in a XML file. I have managed to create a tree, and subelements, but a must put some XML namespace in the code. Because the XML file in the end must been opened in a program that gives you a table of the informations, and something more. But without the scheme from the XML namespace it won't open anything. Can someone help me in how to put a .xsd in my code?

This is the scheme: http://www.pufbih.ba/images/stories/epp_docs/PaketniUvozObrazaca_V1_0.xsd

Example of XML file a must create: http://www.pufbih.ba/images/stories/epp_docs/4200575050089_1022.xml

And in the first row a have the scheme that I must input: "urn:PaketniUvozObrazaca_V1_0.xsd"

This is the code a created so far:

import xml.etree.ElementTree as xml

def GenerateXML(GIP1022):
root=xml.Element("PaketniUvozObrazaca")
p1=xml.Element("PodaciOPoslodavcu")
root.append(p1)

jib=xml.SubElement(p1,"JIBPoslodavca")
jib.text="4254160150005"
pos=xml.SubElement(p1,"NazivPoslodavca")
pos.text="MOJATVRTKA d.o.o. ORAŠJE"
zah=xml.SubElement(p1,"BrojZahtjeva")
zah.text="8"
datz=xml.SubElement(p1,"DatumPodnosenja")
datz.text="2021-01-01"

tree=xml.ElementTree(root)
with open(GIP1022,"wb") as files:
    tree.write(files)

if __name__=="__main__":
GenerateXML("primjer.xml")
mzjn
  • 48,958
  • 13
  • 128
  • 248

3 Answers3

1

The official documentation is not super explicit as to how one works with namespaces in ElementTree, but the core of it is that ElementTree takes a very fundamental(ist) approach: instead of manipulating namespace prefixes / aliases, elementtree uses Clark's Notation.

So e.g.

<bar xmlns="foo">

or

<x:bar xmlns:x="foo">

(the element bar in the foo namespace) would be written

{foo}bar
>>> tostring(Element('{foo}bar'), encoding='unicode')
'<ns0:bar xmlns:ns0="foo" />'

alternatively (and sometimes more conveniently for authoring and manipulating) you can use QName objects which can either take a Clark's notation tag name, or separately take a namespace and a tag name:

>>> tostring(Element(QName('foo', 'bar')), encoding='unicode')
'<ns0:bar xmlns:ns0="foo" />'

So while ElementTree doesn't have a namespace object per-se you can create namespaced object like this, probably via a helper partially applying QName:

>>> root = Element(ns("PaketniUvozObrazaca"))
>>> SubElement(root, ns("PodaciOPoslodavcu"))
<Element <QName '{urn:PaketniUvozObrazaca_V1_0.xsd}PodaciOPoslodavcu'> at 0x7f502481bdb0>
>>> tostring(root, encoding='unicode')
'<ns0:PaketniUvozObrazaca xmlns:ns0="urn:PaketniUvozObrazaca_V1_0.xsd"><ns0:PodaciOPoslodavcu /></ns0:PaketniUvozObrazaca>'

Now there are a few important considerations here:

First, as you can see the prefix when serialising is arbitrary, this is in keeping with ElementTree's fundamentalist approach to XML (the prefix should not matter), but it has since grown a "register_namespace" global function which allows registering specific prefixes:

>>> register_namespace('xxx', 'urn:PaketniUvozObrazaca_V1_0.xsd')
>>> tostring(root, encoding='unicode')
'<xxx:PaketniUvozObrazaca xmlns:xxx="urn:PaketniUvozObrazaca_V1_0.xsd"><xxx:PodaciOPoslodavcu /></xxx:PaketniUvozObrazaca>'

you can also pass a single default_namespace to (some) serialization function to specify the, well, default namespace:

>>> tostring(root, encoding='unicode', default_namespace='urn:PaketniUvozObrazaca_V1_0.xsd')
'<PaketniUvozObrazaca xmlns="urn:PaketniUvozObrazaca_V1_0.xsd"><PodaciOPoslodavcu /></PaketniUvozObrazaca>'

A second, possibly larger, issue is that ElementTree does not support validation.

The Python standard library does not provide support for any validating parser or tree builder, whether DTD, rng, xml schema, anything. Not by default, and not optionally.

lxml is probably the main alternative supporting validation (of multiple types of schema), its core API follows ElementTree but extends it in multiple ways and directions (including much more precise namespace prefix support, and prefix round-tripping). But even then the validation is (AFAIK) mostly explicit, at least when generating / serializing documents.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • Thanks for the quick response, in truth and this answer is a little bit hard (advanced) for me, but I will do the best to comprehend it. xD – Obedar_King Jul 27 '21 at 10:28
0

The tree.write() method takes a default_namespace argument.

What happens if you change that line to the following?

tree.write(files, default_namespace="urn:PaketniUvozObrazaca_V1_0.xsd")
Beetle
  • 1,959
  • 16
  • 32
  • Sorry for my late response, and thank you for help. If I understood you good this is what I get this kind of error: https://imgur.com/a/LK0gMrr – Obedar_King Jul 31 '21 at 11:12
  • A search of SO (or indeed Google) for `ValueError: cannot use non-qualified names with default_namespace option` produces [this answer](https://stackoverflow.com/a/18340978/952580). @Andomar isn't doing exactly the same thing as you, but does it work to put `xml.register_namespace("", "urn:PaketniUvozObrazaca_V1_0.xsd")` just before your `root=xml.Element("PaketniUvozObrazaca")`? – Beetle Aug 03 '21 at 17:01
  • I tried that way, and got no errors. But this didn't solve my problem, because this does not put the schema in my xml, because after this my xml lock like this ( https://imgur.com/a/cWhSSya ). You see there in not a link in my xml that i wrote, and it must be the same as in the left document, that my software can read the xml I create. I'm trying now with the lxml package to put the xsd, but of course and this is a pain in the ass.... – Obedar_King Aug 03 '21 at 18:06
0

What you want is to add a default namespace declaration (xmlns="urn:PaketniUvozObrazaca_V1_0.xsd") to the root element. I have edited the code in the question to show you how this can be done.

import xml.etree.ElementTree as ET

def GenerateXML(GIP1022): 
    # Create the PaketniUvozObrazaca root element in the urn:PaketniUvozObrazaca_V1_0.xsd namespace 
    root = ET.Element("{urn:PaketniUvozObrazaca_V1_0.xsd}PaketniUvozObrazaca")

    # Add subelements
    p1 = ET.Element("PodaciOPoslodavcu")
    root.append(p1)

    jib = ET.SubElement(p1,"JIBPoslodavca")
    jib.text = "4254160150005"
    pos = ET.SubElement(p1,"NazivPoslodavca")
    pos.text = "MOJATVRTKA d.o.o. ORAŠJE"
    zah = ET.SubElement(p1,"BrojZahtjeva")
    zah.text = "8"
    datz = ET.SubElement(p1,"DatumPodnosenja")
    datz.text = "2021-01-01"

    # Make urn:PaketniUvozObrazaca_V1_0.xsd the default namespace (no prefix)
    ET.register_namespace("", "urn:PaketniUvozObrazaca_V1_0.xsd")

    # Prettify output (requires Python 3.9)
    ET.indent(root)

    tree = ET.ElementTree(root)

    with open(GIP1022,"wb") as files:
        tree.write(files)

if __name__=="__main__":
    GenerateXML("primjer.xml")

Contents of primjer.xml:

<PaketniUvozObrazaca xmlns="urn:PaketniUvozObrazaca_V1_0.xsd">
  <PodaciOPoslodavcu>
    <JIBPoslodavca>4254160150005</JIBPoslodavca>
    <NazivPoslodavca>MOJATVRTKA d.o.o. ORA&#352;JE</NazivPoslodavca>
    <BrojZahtjeva>8</BrojZahtjeva>
    <DatumPodnosenja>2021-01-01</DatumPodnosenja>
  </PodaciOPoslodavcu>
</PaketniUvozObrazaca>

Note that only the root element is explicitly bound to a namespace in the code. The subelements do not need to be in a namespace when they are added. The end result is an XML document (primjer.xml) where all elements belong to the same default namespace.

The above is not the only way to create an element in a namespace. For example, instead of the {namespace-uri}name notation, the QName class can be used. See https://stackoverflow.com/a/58678592/407651.

mzjn
  • 48,958
  • 13
  • 128
  • 248
  • BRILIANT, thank you a lot!!! This solved my problem!!! The software that I must use to read the xml file, read it and create a empty table. Now I must write the rest of my xml file to se if all elements will appear in the table in the software. – Obedar_King Aug 04 '21 at 20:19