0

I'm trying to read some xml data for a project, but It just won't work... I have this code (used xml file is shown below):

import time
from xml.etree.ElementTree import fromstring, ElementTree
import xml.etree.ElementTree as ET
ET.register_namespace('', "http://www.w3.org/2001/XMLSchema-instance")
ET.register_namespace('', "http://bison.connekt.nl/tmi8/kv6/msg")

while True:
    print("--------------------------------------------")
    tree = ET.parse("RET.xml")
    root = tree.getroot()
    print(root)
    for debug in root.findall(".//"): 
        print(debug.text)
    for line in root.findall('.Version'):
    print(line.text)
    print("--------------------------------------------")
    time.sleep(5)

It successfully finds the contents of all elements, but when I search for a specific element like 'Version' it won't return any content. This is the current output:


<Element '{http://bison.connekt.nl/tmi8/kv6/msg}VV_TM_PUSH' at 0x03D775A0>

RET

BISON 8.1.1.0
KV6posinfo
2020-12-04T21:22:56.1275145+01:00
ttt

RET
M007

2020-12-04

200180

0

HA8215

0

2020-12-04T21:22:56.1119143+01:00

SERVER

0
-920

--------------------------------------------

And this is the used XML file:

<?xml version="1.0" encoding="utf-8"?>
<VV_TM_PUSH xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://bison.connekt.nl/tmi8/kv6/msg">
<SubscriberID>
RET</SubscriberID><Version>
BISON 8.1.1.0</Version>
<DossierName>KV6posinfo</DossierName>
<Timestamp>2020-12-04T21:22:56.1275145+01:00</Timestamp>
<KV6posinfo>ttt
<ONSTOP>
<dataownercode>RET</dataownercode>
<lineplanningnumber>M007</lineplanningnumber>
<operatingday>2020-12-04</operatingday>
<journeynumber>200180</journeynumber>
<reinforcementnumber>0</reinforcementnumber>
<userstopcode>HA8215</userstopcode>
<passagesequencenumber>0</passagesequencenumber>
<timestamp>2020-12-04T21:22:56.1119143+01:00</timestamp>
<source>SERVER</source>
<vehiclenumber>0</vehiclenumber>
<punctuality>-920</punctuality>
</ONSTOP>
</KV6posinfo>
</VV_TM_PUSH>

I added 'ttt' in the tag, for testing purposes.

Can anyone help?

Asocia
  • 5,935
  • 2
  • 21
  • 46
Jospuntnl
  • 13
  • 2

1 Answers1

0

I see 2 flaws in your code:

  1. In line for line in root.findall('.Version'): there is unnecessary dot. Note that Version is a direct descenant of the root element, so it is enough to put just the tag name (actually not only, details later).
  2. Line print(line.text) is not indented, so actually you should have received a compilation error.

An now how to properly work with namespaced XML:

Note that your input file contains a default namespace (http://bison.connekt.nl/tmi8/kv6/msg)

Then if you want to refer to any element in this namespace, you have to:

  • define a namespace dictionary (key - prefix, value - URI), in your case only one "prefix:URI" pair,
  • precede the tag name with the namespace prefix and a colon,
  • pass the namespace dictionary as the second argument to e.g. findall.

So in your case replace the respective code fragment with:

ns = {'bis': 'http://bison.connekt.nl/tmi8/kv6/msg'}
for line in root.findall('bis:Version', ns):
    print(f'Version: {line.text}')

I added Version: to indicate what has been printed.

Consider also printing line.text.strip(), because your tag has a leading NewLine.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41