I have a number of XML files I'd like to process with a script, converting them from whatever encoding that they're in to UTF-8.
Using the code given in this great answer I can do the conversion, but how can I read the encoding given in the XML header?
For example, I have many files which are already in UTF-8, which should be left alone:
<?xml version="1.0" encoding="utf-8"?>
However, I have a lot of files which do need to be converted:
<?xml version="1.0" encoding="windows-1255"?>
How can I detect the XML encoding specified in the headers of these files in Python? Better, after I detect and reencode the files, how then can I change this XML header to read "utf-8" to avoid processing it in the future?