Parsing XML with ETREE: finding 'xl' element properties

Question

I have the (abbreviated) XML file below ( I also changed that element names a bit to obscure the application).

<?xml version="1.0" encoding="UTF-8" ?>
<Workplace Type="PP-1"
            Version="0.2"
            xmlns:xl="http://www.w3.org/1999/xlink">
    <Template xl:actuate="perFile"
                xl:href="../templates/opt/CMPRfile"
                xl:show="none"
                xl:title="CPP1"
                xl:type="verycomplicated"/>
    <ProjectID xl:actuate="withtrain"
                xl:href="filename.ppp"
                xl:show="none"
                xl:type="evenmorecomplicated"/>
/>

I want to parse the XML file with ETREE and find the values for the 'xl:' elements. How do I do that exactly. The do not seem to be attributes or text. Is this some kind of special property? I tried to find the value for 'href' for example using some code like below.

I tried to look up and figure out what the 'xl' labels are, but no luck. What is also curious is when I print the attributes for the 'Workplace' node, then I get 'Type' and 'Version', but not 'xmlns'. So, I suspect that this is somekind of special attribute? This is my first time doing serious XML parsing, so Iam probably missing something here.

I tried this:

    xml_namespace = "{http://www.w3.org/1999/xlink}"
    tree = ET.parse(project_file_name)
    xml_root_element = tree.getroot()

    projectid_element = xml_root_element.find(xml_namespace + 
    "ProjectId")
    
    # Doesn't work
    value = projectid_element.text 
    value = projectid_element.attrib["href"]
    value = projectid_element.attrib["xl:href"]

    print("Value: " + value)

And I was expecting the value 'filename.ppp'

Edit 20221221_1557:

I did look at the article mentioned by FordPerfect, but I still do not seem to be able to extract the values. I have this code:


    tree, ns = parse_and_get_ns(project_file_name)
    xml_root_element = tree.getroot()
    print("xml_root_element type: " + str(type(xml_root_element)))
    print("Namespaces found: ")
    print(ns)
    elements = xml_root_element.iterfind("xl:href", ns)
    print("elements type: " + str(type(elements)))
    for ele in elements:
        print("elements ele object type: " + str(type(ele)))

and I get this as output:

xml_root_element type: <class 'xml.etree.ElementTree.Element'>
Namespaces found:
{'xl': '{http://www.w3.org/1999/xlink}'}

So you can see that the root element i am iterating over is indeed an element and that the final outcome does not contain any objects. However, I do expected at least 1 to be there.

`ProjectID` is an element, `xl:actuate` is an attribute of that element. — LMC, Dec 21 '22 at 15:25

score 1 · Answer 1 · answered Dec 21 '22 at 15:39

After some fiddeling I found out that these elements are keys. You can get them by using the methode keys() for an element like so:

xml_element = root.find(
        xml_namespace + "Model_Segment"
    ) # One of the elements in the XML example

    keys = xml_element.keys()
    for key in keys:
        print("key: " + key)

Output:

key: {http://www.w3.org/1999/xlink}href

key: {http://www.w3.org/1999/xlink}label

key: {http://www.w3.org/1999/xlink}role

key: {http://www.w3.org/1999/xlink}title

key: {http://www.w3.org/1999/xlink}type

I also will mark the anwser from FordPerfect as anwser, since it contains very usefull information within the context of this question.

Cool that you found out how to solve it. I just recently started looking into xmls, but I couldn't look in deeper right now. — FordPrefect, Dec 21 '22 at 15:59
You should not refer to the items prefixed with `xi:` as elements; they are attributes. That is an important distinction. — mzjn, Dec 21 '22 at 16:34

FordPrefect · Answer 2 · 2022-12-21T13:36:07.960

0

Edit: Forget what I wrote!

Have a look at this answer:
https://stackoverflow.com/a/14853417/10576322

edited Dec 21 '22 at 13:36

answered Dec 21 '22 at 12:56

FordPrefect

320
2
11

Thanks a bunch for replying. I did look at the article and I can get the idea. The namespaces are defined first, then the attributes/tags (what is the correct term?) are prefixed with the namespace. If I now try this, then it still does not work, but that does seem to be the way to do it? (I edited the main question) – Niels Broertjes Dec 21 '22 at 14:56
1

Please don't post link-only answers. – mzjn Dec 21 '22 at 16:35
Sry. I had an idea on the topic, but later on saw it was wrong. I found the relevant answer and linked it, since I didn't had further time. – FordPrefect Dec 21 '22 at 17:31

Parsing XML with ETREE: finding 'xl' element properties

2 Answers2