5

I'm trying to parse an xml document that has a number of undefined entities that cause a ParseError when I try to run my code, which is as follows:

import xml.etree.ElementTree as ET

tree = ET.parse('cic.fam_lat.xml')
root = tree.getroot()

while True:
    try:
        for name in root.iter('name'):
            print(root.tag, name.text)
    except xml.etree.ElementTree.ParseError:
        pass

for name in root.iter('name'):
    print(name.text)

An example of said error is as follows, and there are a number of undefined entities that will all throw the same error: error description

I just want to ignore them rather than go in and edit out each one. How should I edit my exception handling to catch these error instances? (i.e., what am I doing wrong?)

Jean Hominal
  • 16,518
  • 5
  • 56
  • 90
Daniel
  • 67
  • 1
  • 1
  • 4

2 Answers2

5

There are some workarounds, like defining custom entities, suggested at:

But, if you are able to switch to lxml, its XMLParser() can work in the "recover" mode that would "ignore" the undefined entities:

import lxml.etree as ET

parser = ET.XMLParser(recover=True)
tree = ET.parse('cic.fam_lat.xml', parser=parser)

for name in root.iter('name'):
    print(root.tag, name.text)

(worked for me - got the tag names and texts printed)

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Excellent, thank you! Yeah, lxml seems to be the way to go -- recover mode worked perfectly. Now just to figure out how to get to a certain parent tag from each instance of ... – Daniel Dec 21 '17 at 04:55
  • 4
    This doesn't really answer the question. Why is the `ParseError` not caught? – LondonRob Nov 19 '19 at 15:42
  • @LondonRob: The exception is thrown already at `tree = ET.parse('cic.fam_lat.xml')`. The document is ill-formed because of the undefined entity and ElementTree refuses to parse it. – mzjn Oct 27 '21 at 11:56
1

You can catch the exception simply by referencing the ParseError like this:

try:
    # Something neat

except ET.ParseError:
    # Exception catch

This is on Python 3.7.10, Windows 10.

Sworup Shakya
  • 1,328
  • 3
  • 16
  • 44
jdelahanty
  • 11
  • 3