7

I have a bunch of XML files which are using prefixes but without the corresponding namespace declaration.

Stuff like:

<tal:block tal:condition="foo">
...
</tal:block>

or:

<div i18n:domain="my-app">
...

I know where those prefixes come from, an I tried the following, but without success:

from lxml import etree as ElementTree

ElementTree.register_namespace("i18n", "http://namespaces.zope.org")
ElementTree.register_namespace("tal", "http://xml.zope.org/namespaces/tal")

with open(path) as fp:
    tree = ElementTree.parse(fp)

but lxml still chokes with:

lxml.etree.XMLSyntaxError: Namespace prefix i18n for domain on div is not defined, line 4, column 20

I know I can use ElementTree.XMLParser(recover=True), but I would like to keep the prefix anyway, which this method don't.

Any idea?

Francis Upton IV
  • 19,322
  • 3
  • 53
  • 57
Jonathan Ballet
  • 973
  • 9
  • 21

1 Answers1

4

It's not valid XML, using undefined prefixes, so no XML parser is going to be able to deal with it.

Your best bet (other than fixing the XML) is to programmaticly modify the XML source to add the namespace attributes to the root element (just using the string support in your language). Add xmlns:tal="http://xml.zope.org/namespaces/tal", etc to the root element before you give the XML to the parser. Then the XML parser should handle it without complaint and without any registering namespaces.

Francis Upton IV
  • 19,322
  • 3
  • 53
  • 57
  • What is a safe way to programmatically add attributes to an element without a parser? Not regexp: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – yig Mar 17 '20 at 14:51