I've got some badly-formed XML files using Python, and I need to figure out what's wrong with them (ie. what the errors are) without actually looking at the data (the files are sensitive client data).
I figure there should be a way to sanitize the XML (ie. remove all content in all nodes) but keep the tags, so that I can see any structural issues.
However, ElementTree doesn't return any detailed information about mismatched tags - just a line number and a character position which is useless if I can't reference the original XML.
Does anyone know how I can either sanitize the XML so I can view it, or get more detailed error messages for badly formed XML (that won't return tag contents)? I could write a customer parser to strip content, but I wanted to exhaust other options first.