How to preserve the original encoding and line endings when writing to a file?

Question

I have written a script to make some adjustments for Visual Studio project files which have a xml structure. However, when I write the modified version back to the file, I expectedly end up with the encoding and line endings that can be differ from the originals. Therefore, for example, bitbucket sees the whole file has been rewritten.

I do want the script to detect and use the original parameters. However, the ansewrs on detecting the encoding a very dated.

Is there a standard/common and reliable way for modern Python to accomplish this task?

This is just for an exposition to clarify what I am actually doing

    detected_encoding = how_do_i_detect_encoding(args.project_path) # not an actual function
    detected_line_endings = also_get_line_endings(args.project_path) # not an actual function

    proj_xml = lxml.etree.parse(str(args.project_path))

    do_necessary_adjustments(proj_xml) # not an actual function

    # etree.tostring returns byte, so binary mode is required
    with open(args.project_path, 'wb') as proj_file:
        proj_file.write(codecs.BOM_UTF8)
        proj_file.write(etree.tostring(proj_xml, # of type lxml.etree
                                           xml_declaration=True,
                                           pretty_print=True,
                                           encoding=detected_encoding).replace(UNIX_LINE_ENDING, detected_line_endings) # I know that LF is generated

This is an excerpt from the docs:

xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding 1 is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns an (optionally) encoded string containing the XML data.

While my answer on how to preserve line endings is aimed at Python 2, all you need to know is that it works exactly the same in Python 3, where `open()` is the same thing as `io.open()`. — Martijn Pieters, Jan 29 '22 at 02:38

How to preserve the original encoding and line endings when writing to a file?

0 Answers0