3

I have written a script to make some adjustments for Visual Studio project files which have a xml structure. However, when I write the modified version back to the file, I expectedly end up with the encoding and line endings that can be differ from the originals. Therefore, for example, bitbucket sees the whole file has been rewritten.

I do want the script to detect and use the original parameters. However, the ansewrs on detecting the encoding a very dated.

Is there a standard/common and reliable way for modern Python to accomplish this task?

This is just for an exposition to clarify what I am actually doing

    detected_encoding = how_do_i_detect_encoding(args.project_path) # not an actual function
    detected_line_endings = also_get_line_endings(args.project_path) # not an actual function

    proj_xml = lxml.etree.parse(str(args.project_path))

    do_necessary_adjustments(proj_xml) # not an actual function

    # etree.tostring returns byte, so binary mode is required
    with open(args.project_path, 'wb') as proj_file:
        proj_file.write(codecs.BOM_UTF8)
        proj_file.write(etree.tostring(proj_xml, # of type lxml.etree
                                           xml_declaration=True,
                                           pretty_print=True,
                                           encoding=detected_encoding).replace(UNIX_LINE_ENDING, detected_line_endings) # I know that LF is generated

This is an excerpt from the docs:

xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding 1 is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns an (optionally) encoded string containing the XML data.

Sergey Kolesnik
  • 3,009
  • 1
  • 8
  • 28
  • While my answer on how to preserve line endings is aimed at Python 2, all you need to know is that it works exactly the same in Python 3, where `open()` is the same thing as `io.open()`. – Martijn Pieters Jan 29 '22 at 02:38

0 Answers0