Python will encode unicode values to bytes when printing to the console.
Encode explicitly when sending to a browser, by writing directly to sys.stdout
:
#!/usr/bin/python3.2
import sys
out = sys.stdout
out.write(b"Content-Type: text/html; charset=utf8\r\n")
out.write(b"\r\n")
y = "£17"
out.write("Test: {0}\r\n".format(y).encode(encoding='utf8'))
Note that HTTP headers should use a \r\n
(carriage return, newline) combo, really. I've also added the encoding used to the Content-Type
header so the browser knows how to decode it again.
For HTML, you really want to use character entity references instead of Unicode code points:
y = "£17"
out.write("Test: {0}\r\n".format(y).encode(encoding='utf8'))
at which point you could also just use ASCII as your encoding.
If you really, really, really want to use print()
, then re-open stdout
with the correct encoding:
utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print("Content-Type: text/html; charset=utf8", end='\r\n', file=utf8stdout)
print("", end='\r\n', file=utf8stdout)
y = "£17"
print("Test:", y, end='\r\n', file=utf8stdout)
You could simplify that somewhat with functools.partial()
:
from functools import partial
utf8print = partial(print, end='\r\n', file=utf8stdout)
then use utf8print()
without the extra keywords:
utf8print("Content-Type: text/html; charset=utf8")
utf8print("")
# etc.
Also see the Python Unicode HOWTO for details on how Python sets output encoding, as well as this question here on Stack Overflow about printing and encoding.