0

My job is to parse XML files and retrieve various reports. I also create and edit XML files using etree in Python. Most of the time, i am stuck in files with custom entities like mdash, nbsp, and so on.

I browsed and found one solution mentioned here Python ElementTree support for parsing unknown XML entities?

So i added the entity definition [!ENTITY nbsp " ] and worked on it.It works but i need to read them as string, add the entity definition to it, and then carry on my work.

Is this the only way? If i want to parse the XML files with custom entities without adding them to the file, can i do that?

Is there a way to define those entities in the script and parse the XMl files?

Shahul Hameed
  • 181
  • 2
  • 5
  • 16
  • If the files that you work with contain entity references like ` ` but no corresponding entity declarations, then the files are ill-formed and therefore not really XML files. – mzjn Mar 21 '18 at 16:11
  • I understand. But can't help it. I have 100k+ XMl files, and still a lot to come. I can't add those entities back to the XML files (which would be my last way). Is there anything i can do without adding them to the files or the way i mentioned, i would like to know. – Shahul Hameed Mar 21 '18 at 16:38
  • This workaround works with lxml but not with ElementTree, unfortunately: https://stackoverflow.com/a/9128457/407651 – mzjn Mar 21 '18 at 16:40

0 Answers0