0

I am trying to parse an XML file with several namespaces. I already have a function which produces namespace map – a dictionary with namespace prefixes and namespace identifiers (example in the code). However, when I pass this dictionary to the findall() method, it works only with the first namespace but does not return anything if an element on XML path is in another namespace.

(It works only in case of the first namespace which has None as its prefix.)

Here is a code sample:

import xml.etree.ElementTree as ET

file - '.\folder\example_file.xml' # path to the file
xml_path = './DataArea/Order/Item/Price' # XML path to the element node

tree = ET.parse(file)
root = tree.getroot()
nsmap = dict([node for _, node in ET.iterparse(exp_file, events=['start-ns'])])
# This produces a dictionary with namespace prefixes and identifiers, e.g.
# {'': 'http://firstnamespace.example.com/', 'foo': 'http://secondnamespace.example.com/', etc.}
for elem in root.findall(xml_path, nsmap):
    # Do something

EDIT: On the mzjn's suggestion, I'm including sample XML file:

<?xml version="1.0" encoding="utf-8"?>
<SampleOrder xmlns="http://firstnamespace.example.com/" xmlns:foo="http://secondnamespace.example.com/" xmlns:bar="http://thirdnamespace.example.com/" xmlns:sta="http://fourthnamespace.example.com/" languageCode="en-US" releaseID="1.0" systemEnvironmentCode="PROD" versionID="1.0">
    <ApplicationArea>
        <Sender>
            <SenderCode>4457</SenderCode>
        </Sender>
    </ApplicationArea>
    <DataArea>
        <Order>
            <foo:Item>
                <foo:Price>
                    <foo:AmountPerUnit currencyID="USD">58000.000000</foo:AmountPerUnit>
                    <foo:TotalAmount currencyID="USD">58000.000000</foo:TotalAmount>
                </foo:Price>
                <foo:Description>
                    <foo:ItemCode>259601</foo:ItemCode>
                    <foo:ItemName>PORTAL GUN 6UBC BLUE</foo:ItemName>
                </foo:Description>
            </foo:Item>
            <bar:Supplier>
                <bar:SupplierID>4474</bar:SupplierID>
                <bar:SupplierName>APERTURE SCIENCE, INC</bar:SupplierName>
            </bar:Supplier>
            <sta:DeliveryLocation>
                <sta:RecipientID>103</sta:RecipientID>
                <sta:RecipientName>WARHOUSE 664</sta:RecipientName>
            </sta:DeliveryLocation>
        </Order>
    </DataArea>
</SampleOrder>
Myklebost
  • 59
  • 8
  • When searching for elements in namespaces, one option is to use a wildcard. Examples: https://stackoverflow.com/a/61154644/407651, https://stackoverflow.com/a/62117710/407651 – mzjn Jan 19 '22 at 11:25
  • @mzjn Solution you proposed works quite well for most cases, thank you. I edited to question to reflect that. – Myklebost Jan 19 '22 at 15:00

2 Answers2

1

Based on Jan Jaap Meijerink's answer and mzjn's comments under the question, the solution is to insert namespace prefixed in the XML path. This can be done by inserting a wildcard {*} as mzjn's comment and this answer (https://stackoverflow.com/a/62117710/407651) suggest.

To document the solution, you can add this simple operation to your code:

xml_path = './DataArea/Order/Item/Price/TotalAmount'
xml_path_splitted_to_list = xml_path.split('/')
xml_path_with_wildcard_prefix = '/{*}'.join(xml_path_splitted_to_list)

In case there are two or more nodes with the same XML path but different namespaces, findall() method (quite naturally) accesses all of those element nodes.

Myklebost
  • 59
  • 8
0

You should specify the namespaces in your xml_path, for example: ./foo:DataArea/Order/Item/bar:Price. The reason it works with the empty namespace is because it is the default, you don't have to specify that one in your path.