This should be trivial but ...! I am writing to a UTF-8 encoded file and the text includes "Côte d'Ivoire". As I understand it "ô" is U+00F4. The character displays correctly everywhere but ends up in the file as U+C3B4 which should be in the Unicode Block HANGUL_SYLLABLES ("쎴").
Any attempt to replace U+C3B4 with U+00F4 seems to change nothing - all four lines of the file below contain it.
This creates a problem because when the file is eventually written to a database it displays as "Côte d'Ivoire".
Update: If I use with io.open("Test.html", "w") as f_out: below then the file contains the correct U+00F4 which displays as a "?" The final database record still displays as "Côte d'Ivoire" though :-(
MWE:
from __future__ import unicode_literals
import io
line="The current population of Côte d'Ivoire is 26,051,291"
for c in line:
if ord(c) > 127:
print(c, c.encode('utf-8').hex())
line1 = line.replace(u"\uC3B4", "ô")
line2 = line.replace(c, u"\u00F4")
line3 = line.replace(c, "ô")
#with io.open("Test.html", "w", encoding="utf-8") as f_out:
with io.open("Test.html", "w") as f_out:
f_out.write(line+"\n")
f_out.write(line1+"\n")
f_out.write(line2+"\n")
f_out.write(line3+"\n")
Hex editor:
00000000h: 54 68 65 20 63 75 72 72 65 6E 74 20 70 6F 70 75 ; The current popu
00000010h: 6C 61 74 69 6F 6E 20 6F 66 20 43 C3 B4 74 65 20 ; lation of Côte
00000020h: 64 27 49 76 6F 69 72 65 20 69 73 20 32 36 2C 30 ; d'Ivoire is 26,0
00000030h: 35 31 2C 32 39 31 0D 0A 54 68 65 20 63 75 72 72 ; 51,291..The curr
00000040h: 65 6E 74 20 70 6F 70 75 6C 61 74 69 6F 6E 20 6F ; ent population o
00000050h: 66 20 43 C3 B4 74 65 20 64 27 49 76 6F 69 72 65 ; f Côte d'Ivoire
00000060h: 20 69 73 20 32 36 2C 30 35 31 2C 32 39 31 0D 0A ; is 26,051,291..
00000070h: 54 68 65 20 63 75 72 72 65 6E 74 20 70 6F 70 75 ; The current popu
00000080h: 6C 61 74 69 6F 6E 20 6F 66 20 43 C3 B4 74 65 20 ; lation of Côte
00000090h: 64 27 49 76 6F 69 72 65 20 69 73 20 32 36 2C 30 ; d'Ivoire is 26,0
000000a0h: 35 31 2C 32 39 31 0D 0A 54 68 65 20 63 75 72 72 ; 51,291..The curr
000000b0h: 65 6E 74 20 70 6F 70 75 6C 61 74 69 6F 6E 20 6F ; ent population o
000000c0h: 66 20 43 C3 B4 74 65 20 64 27 49 76 6F 69 72 65 ; f Côte d'Ivoire
000000d0h: 20 69 73 20 32 36 2C 30 35 31 2C 32 39 31 0D 0A ; is 26,051,291..