Python XML syntax checking - Enforce NO '>' in the text of an element

Question

I have a question here. I have been using Python's Etree to parse XML and do syntax checking on it. The problem I am having is it will throw an error when it is unable to parse the XML, but it is not good about indicating where the mistake was actually first made. I realized what I kind of need is to be able to enforce a rule that says there is to be no '>' in the text of an XML element (which for my XML purposes is correct and sound). Is there a way to tell Etree to do this when parsing the XML? I know there is libxml, but if I am to use a library that doesn't come by default with Python 2.75, then I will need the source code as I am not allowed to install additional Python libraries where I work. So, an answer to the question about enforcing no '>' in the text of an XML element, and some suggestions on how to spot the line where a mistake is first made in an XML document; such as forgetting the opening '<' in a closing XML tag. Any help would be much appreciated! Thanks.

You must write well-formed XML if you wish to use XML tools. If you're unhappy with the error messages from the tools you choose to use, try different tools. Realize, however, that some parsing rule violations have many causes, and parsers typically don't enumerate all possible corrections as that'd be impractical in the general case. — kjhughes, Sep 04 '18 at 16:54
Consider wrapping text with [CData](https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean) used for just this purpose. — Parfait, Sep 04 '18 at 19:03

score 1 · Answer 1 · answered Sep 04 '18 at 21:08

I'm not sure about your headline question. Why do you want to enforce a rule that ">" does not appear in text, since there is no such rule in XML?

If you're not happy with the diagnostics you're getting from an XML parser then the only real option is to try a different parser (though do check that you are extracting all the information that's available - I don't know Python's ETree, but some parsers hide the diagnostics in obscure places).

But there are some limitations. If a start tag is missing, then no parser is going to be able to tell you where it should have been; it can only tell you where the unmatched end tag is. So asking it to tell you "where the mistake was first made" is asking too much.

Python XML syntax checking - Enforce NO '>' in the text of an element

1 Answers1