I need to decide when (not) to convert a text file based on the known file encoding and the desired output encoding.
If the text is US-ASCII, I don't need to convert it if the output encoding is ASCII, UTF-8, Latin1, ...
Obviously I need to convert a US-ASCII file to UTF-16 or UTF-32.
A list of standard encodings exists at
http://www.iana.org/assignments/character-sets/character-sets.xml
A conversion is necessary if:
- the minimal character size is > 1 byte or
- the first 127 code points are not the same as US-ASCII.
I'd like to know:
- Is there a similar list with details (bytelenght, ASCII-compatibility) about the implementation of each encoding?
- I'd be happy about a list containing only codecs supported by Qt5.
EDIT
I already found an answer to the question
- Are all 8-or-variable8-bit-based codecs a superset of ASCII?
- In other words: Can US-ASCII be interpreted as any 8-or-variable8-bit-based encoding?
here: Character set that is not a superset of ASCII
Instead, it would be helpful to know:
- Is there a list of character sets which are supersets of ASCII?
This looks promising:
mime.charsets - list of character sets which are ASCII supersets,
but I couldn't find an actual mime.charsets file.