0

I have the xml file shown below, that has namespaces, for which I'm trying to extract the values of Node24

My current code is below, that is not printing anything:

import xml.etree.ElementTree as ET

filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()

for neighbor in root.iter('Node24'):
    print(neighbor)

My expected output would be:

03-c34ko
04-c64ko
07-c54ko  

The is the ifile.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<data-main-43:DATAMAINXZ123 xmlns="https://example.com/DATA-MAIN-XZ123" xmlns:data-gen="https://example.com/DATA-GEN" xmlns:data-main-43="https://example.com/DATA-MAIN-XZ123" xmlns:xsi="http://www.w3.org/2011/XMLSchema-instance" xsi:schemaLocation="https://example.com/DATA-MAIN-XZ123 data-main-ir21-12.1.xsd">
  <MAINXZ123FileHeader>
    <DATAGenSchemaVersion>2.4</DATAGenSchemaVersion>
    <DATAMAINXZ123SchemaVersion>12.1</DATAMAINXZ123SchemaVersion>
  </MAINXZ123FileHeader>
  <Node1>
    <Node2>WTRT DDK</Node2>
    <Node3>XYZW</Node3>
    <Node4>
      <Node5>
        <Node6>XYZW882</Node6>
        <Node5Type>Ter</Node5Type>
        <Node5Data>
          <Node9>
            <Node10>
              <Node11>2019-02-18</Node11>
              <Node12>
                <Node13>
                  <Node14>
                    <Node15>Ermso</Node15>
                    <Node16>
                      <PrimaryNode16>
                        <Node18>19.32</Node18>
                        <Node18>12.11</Node18>
                      </PrimaryNode16>
                      <SecondaryNode16>
                        <Node18>82.97</Node18>
                        <Node18>12.41</Node18>
                      </SecondaryNode16>
                    </Node16>
                    <Node20>Muuatippw</Node20>
                  </Node14>
                </Node13>
              </Node12>
              <Node21>
                <Node22>
                  <Node23>
                    <Node24>03-c34ko</Node24>
                    <Node24>04-c64ko</Node24>
                    <Node24>07-c54ko</Node24>
                  </Node23>
                  <Node26Node22EdgeAgent>
                    <Node26>jjkksonem</Node26>
                    <PrimaryNode18DEANode26>
                      <Node18>2.40</Node18>
                    </PrimaryNode18DEANode26>
                  </Node26Node22EdgeAgent>
                </Node22>
              </Node21>
              <Node28>
                <Node29>
                  <Node30>false</Node30>
                  <Node31>true</Node31>
                </Node29>
              </Node28>
            </Node10>
          </Node9>
        </Node5Data>
      </Node5>
    </Node4>
  </Node1>
</data-main-43:DATAMAINXZ123>

How can I do this? Thanks in advance.

mzjn
  • 48,958
  • 13
  • 128
  • 248
Ger Cas
  • 2,188
  • 2
  • 18
  • 45

2 Answers2

1

I'm using regular expression so this is an alternative answer. I converted the xml into string then search for all strings between Node24

import xml.etree.ElementTree as ET
import re

filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()
xml_str = ET.tostring(root) 
for s in re.findall(r'ns0:Node24>(.*?)</ns0:Node24', str(xml_str)):
    print(s)

Result:

03-c34ko
04-c64ko
07-c54ko
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38
  • Thanks for your help. It seems to work. But if I'd like to extract more elements, I'll need to change the node within the regex? another question, what does mean `ns0:...`? – Ger Cas Mar 11 '20 at 23:20
  • Why bother parsing as XML when you could just do `open("ifile.xml").read()` and pretend that there is no default namespace? In most cases, using regexes on XML or HTML is not a good idea. https://stackoverflow.com/q/1732348/407651 – mzjn Mar 12 '20 at 10:38
  • @mzjn I don't get your point hehe. How would you suggest to extract the values of `Node24`? If regex is not a good idea... – Ger Cas Mar 12 '20 at 14:55
  • @mzjn I saw it. In that link they talk about that is not good to parse using regex, others in that thread say "don't listen to these guys", but there is no solution for my question there. – Ger Cas Mar 12 '20 at 15:38
1

Like the duplicate mzjn referenced, just add the namespace uri to the element name...

import xml.etree.ElementTree as ET

filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()

for neighbor in root.iter('{https://example.com/DATA-MAIN-XZ123}Node24'):
    print(neighbor.text)

Note: I also added .text to neighbor so you'd get the requested result.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95