0

Dears, I have an xml file like below

<?xml version="1.0" encoding="utf-8"?>
<Data xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http:/xxxx//bb/v1 /xyz/it/Data/v1/Data-1_2.xsd" version="1.2" xmlns="http://xx/it//Data/v1">
  <Header>
    <Location>abc</Location>
    <Date start="date-time"/>

I am trying to parser different tags and attributes. however, xmln seems to mess up the parsing. I am using code like

tree = ET.parse(input_filename)
root = tree.getroot()
location = tree.find("./Header/Location").text
time = tree.find("./Header/Date").attrib['start']

This works perfectly when I manually remove all xmln attributes in the <Data tag from the input file

<?xml version="1.0" encoding="utf-8"?>
<Data >
  <Header>
    <Location>abc</Location>
    <Date start="date-time"/>

but keeping it give an error

location = tree.find("./Header/Location").text
AttributeError: 'NoneType' object has no attribute 'text'

I tried almost 90% of pervious suggestions still no good results. Highly appreciated.

dominant
  • 7
  • 2

1 Answers1

0

Modern Python version support wildcards for namespaces. Consider this:-

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="utf-8"?>
<Data xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http:/xxxx//bb/v1 /xyz/it/Data/v1/Data-1_2.xsd" version="1.2" xmlns="http://xx/it//Data/v1">
  <Header>
    <Location>abc</Location>
    <Date start="date-time"/>
  </Header>
</Data>'''

tree = ET.fromstring(xml)

location = tree.find('.//{*}Header/{*}Location').text
_time = tree.find('.//{*}Header/{*}Date').attrib['start']

print(f'Location={location}, time={_time}')
  • thanks, this should do for the code, but gave me issues in other parts where I used root.iter This solution https://stackoverflow.com/a/18160164/16726552 worked very well without having to change anything in code for with and without xmlns. – dominant Aug 24 '21 at 16:14