1

I wrote a Python script to add a node to an xml file. Although it works fine, the xml comes out oddly formatted. For example, a new node is added, and if it's the last one for a parent node, the parent node closing tag is now immediately after the new node's closing tag. This is fine technically, but we wanted the xml to "look" good too.

So I installed xmlformatter add-on for Python. Although it lets me format the look of the xml properly, it is changing the encoded ampersands in my text values to actual ampersands, thus breaking the xml.

I tried googling this but couldn't find any mention of it. Can anybody suggest a way to prevent this? I tried changing the encoding_output to various values or eliminating it altogether, but nothing helps.

Thanks...

RMittelman
  • 319
  • 3
  • 16
  • Do you mean it is taking this `&` and turning it into `&`? And is it doing it everywhere (Classes, Parents, Children, Attributes, etc.) or just certain parts of the XML? – dblclik Aug 26 '16 at 16:07
  • Yes, @dblclik, exactly. Here is how it should look: ` `. Happens wherever the `code`&`code` appears. – RMittelman Aug 26 '16 at 16:20
  • I know it's low-tech, but if we suppose that your output was stored as `pretty_xml = xmlformatter(orig_xml)` (sorry, I'm using theoretical functions here to give you the idea), could you use Python's `re` package to just do then `final_xml = re.sub('&','&',pretty_xml)`? – dblclik Aug 26 '16 at 16:26
  • Can't use, because & is included in an html tag with CDATA section, and that would also be "fixed". Best would be to figure out why xmlformatter is "formatting" the `&`, right? – RMittelman Aug 26 '16 at 16:46
  • Did you try the `preserve` argument to the `xmlformatter.Formatter()` initialization call? It looks like you could pass `preserve=['&']` and it should preserve that element (sorry if that still doesn't work, I'm going off of their documentation and it's _meh_ at best) – dblclik Aug 26 '16 at 16:53
  • Do you have an actual working string you could provide? I installed xmlformatter so I could try to help out here if you have a sample XML string or file! – dblclik Aug 26 '16 at 16:58
  • `preserve` may work, but not good idea, because what about the next encoding we use? We would have to tweak the script each time we caught an error. – RMittelman Aug 26 '16 at 17:10
  • Can't send proprietary xml file, but here's a snippet: ` ` Thanks... – RMittelman Aug 26 '16 at 17:13
  • Out of curiosity, does this answer here work? http://stackoverflow.com/a/1206856/1607105 It uses the built-in `xml` package and the `topretty()` function seems to leave the `&` alone – dblclik Aug 26 '16 at 19:45
  • I saw that one also. They want me to use lxml, not dom object (don't know if they are mutually exclusive or not). I'm going to quit on this thread. I found out if I send the parser `parser = etree.XMLParser(remove_blank_text=True)` it properly formats and I don't need xmlformatter anymore. Thanks for all the help! – RMittelman Aug 26 '16 at 20:46
  • Were you able to figure out a solution to this? I've started looking at `xml.parsers.expat.ParserCreate()` because I believe the issue is in there. – Marcel Wilson Sep 27 '17 at 15:15
  • No, never figured it out, but see my last comment above. Don't need xmlformatter anymore. – RMittelman Sep 28 '17 at 16:18

0 Answers0