By most accounts, one ought to be able to change the encoding of a UTF-8 file to a Latin-1 (ISO-8859-1) encoding by a trivial invocation of iconv such as:
iconv -c -f utf-8 -t ISO-8859-1//TRANSLIT
However, this fails to deal with accented characters properly. Consider for example:
$ echo $LC_ALL
C
$ cat Gonzalez.txt
González, M.
$ file Gonzalez.txt
Gonzalez.txt: UTF-8 Unicode text
$ iconv -c -f utf-8 -t ISO-8859-1//TRANSLIT < Gonzalez.txt > out
$ file out
out: ASCII text
$ cat out
Gonzalez, M.
I've tried various variations of the above, but none properly handles the accented "a", the point being that Latin-1 does have an accented "a".
Indeed, uconv
does handle the situation properly:
$ uconv -x Any-Accents -f utf-8 -t l1 < Gonzalez.txt > out
$ file out
out: ISO-8859 text
Opening the file in emacs or
Sublime shows the accented "a" properly. Same thing using -x nfc
.
Unfortunately, my target environment does not permit a solution using "uconv", so I am looking for a simple solution using either iconv or Python3.
python3 attempts
My attempts using python3 so far have not been successful. For example, the following:
import sys
import fileinput # allows file to be specified or else reads from STDIN
for line in fileinput.input():
l=line.encode("latin-1","replace")
sys.stdout.buffer.write(l)
produces:
Gonza?lez, M.
(That's a literal "?".)
I've tried various other Python3 possibilities, so far without success.
Please note that I've reviewed numerous SO questions on this topic, but the answers using iconv or Python3 do not handle Gonzalez.txt properly.