I have written a script to make some adjustments for Visual Studio project files which have a xml structure. However, when I write the modified version back to the file, I expectedly end up with the encoding and line endings that can be differ from the originals. Therefore, for example, bitbucket sees the whole file has been rewritten.
I do want the script to detect and use the original parameters. However, the ansewrs on detecting the encoding a very dated.
Is there a standard/common and reliable way for modern Python to accomplish this task?
This is just for an exposition to clarify what I am actually doing
detected_encoding = how_do_i_detect_encoding(args.project_path) # not an actual function
detected_line_endings = also_get_line_endings(args.project_path) # not an actual function
proj_xml = lxml.etree.parse(str(args.project_path))
do_necessary_adjustments(proj_xml) # not an actual function
# etree.tostring returns byte, so binary mode is required
with open(args.project_path, 'wb') as proj_file:
proj_file.write(codecs.BOM_UTF8)
proj_file.write(etree.tostring(proj_xml, # of type lxml.etree
xml_declaration=True,
pretty_print=True,
encoding=detected_encoding).replace(UNIX_LINE_ENDING, detected_line_endings) # I know that LF is generated
This is an excerpt from the docs:
xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding 1 is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns an (optionally) encoded string containing the XML data.