4

I have a number of scripts which get external data and update parts of xml files.

I use lxml in my python script and it saves character references in decimal notation, for example:

$ cat input.xml
<data>
  <record text="&#x41f;&#x440;&#x438;&#x432;&#x435;&#x442;">
  </record>
</data> 

$ python
>>> from lxml import etree
>>> tree = etree.parse("input.xml")
>>> tree.write("out.xml")

$ cat out.xml
<data>
  <record text="&#1055;&#1088;&#1080;&#1074;&#1077;&#1090;">
  </record>
</data>

While other scripts use hex form: <record text="&#x41f;&#x440;&#x438;&#x432;&#x435;&#x442;"> and so there are endless series of changes in git for these files even if there are no actual changes.

How can I tell lxml to save character references in hex form (&#x41f;) in a python script?

mzjn
  • 48,958
  • 13
  • 128
  • 248
Fedor Dikarev
  • 506
  • 3
  • 9
  • Please [edit] your question and include the code you write the XML files. –  Jul 22 '18 at 17:07
  • 1
    @intentionallyleftblank I've added code sample and input/output xml files – Fedor Dikarev Jul 22 '18 at 17:51
  • 1
    Typing your question into a well-known search engine lead me to [this](https://answers.launchpad.net/lxml/+question/61584) – the exact same question, asked 9 years ago. The solution proposed is to post-process a Unicode version of your XML. ... – Jongware Jul 22 '18 at 20:45
  • 1
    .. although I entirely agree with that answerer's initial advice ".. It's always bad style to make applications depend on a specific XML serialisation done by a specific tool. That's exactly what canonical XML (C14N) was designed for." (scoder, 2009-02-19) – Jongware Jul 22 '18 at 20:46
  • @usr2564301 application doesn't depend but git depends on actual serialisation. One day I will change current scheme but right now I have to support current scripts mess :( – Fedor Dikarev Jul 23 '18 at 07:59
  • Fedor, I fully understand :P Would the suggested solution work for you, and can you write it by yourself? (If you can, by all means do post it as an answer. If you want to use it but can't, I'll have a look.) – Jongware Jul 23 '18 at 08:25

0 Answers0