Python cgi encoding / special characters

Question

This prints "Test: £17" when run from the local console, but only prints "Test: " when run from the web browser. How can I rectify the issue when loaded through the browser? Thanks!

#!/usr/bin/python3.2
print ("Content-Type: text/html")
print ("")

y = "£17"
print ("Test:", y)

@JonClements: That's optional, the real problem is that Python `print()` encodes automatically to the terminal, but for CGI the wrong encoding is used. — Martijn Pieters, Dec 22 '12 at 15:18

score 4 · Accepted Answer · edited May 23 '17 at 12:16

Python will encode unicode values to bytes when printing to the console.

Encode explicitly when sending to a browser, by writing directly to sys.stdout:

#!/usr/bin/python3.2
import sys
out = sys.stdout
out.write(b"Content-Type: text/html; charset=utf8\r\n")
out.write(b"\r\n")

y = "£17"
out.write("Test: {0}\r\n".format(y).encode(encoding='utf8'))

Note that HTTP headers should use a \r\n (carriage return, newline) combo, really. I've also added the encoding used to the Content-Type header so the browser knows how to decode it again.

For HTML, you really want to use character entity references instead of Unicode code points:

y = "&pound;17"
out.write("Test: {0}\r\n".format(y).encode(encoding='utf8'))

at which point you could also just use ASCII as your encoding.

If you really, really, really want to use print(), then re-open stdout with the correct encoding:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout

print("Content-Type: text/html; charset=utf8", end='\r\n', file=utf8stdout)
print("", end='\r\n', file=utf8stdout)

y = "£17"
print("Test:", y, end='\r\n', file=utf8stdout)

You could simplify that somewhat with functools.partial():

from functools import partial
utf8print = partial(print, end='\r\n', file=utf8stdout)

then use utf8print() without the extra keywords:

utf8print("Content-Type: text/html; charset=utf8")
utf8print("")
# etc.

Also see the Python Unicode HOWTO for details on how Python sets output encoding, as well as this question here on Stack Overflow about printing and encoding.

Many thanks! The last option worked for the test case and for what I am working on. I wanted to use the first suggestion but it didn't seem to be working right =/ This has been a huge help, and I learned a bunch too, and is going to impact the way I do several things. Thanks again. — stackity, Dec 23 '12 at 00:47
it seems as if there are output maximums with this method, I'm troubleshooting now. If I use utf8print too much I get a server error. — stackity, Dec 23 '12 at 06:48
@stackity: I doubt that anything maxes out; more likely you made a different error. :-) — Martijn Pieters, Dec 23 '12 at 10:04
I came back to delete the comment (hoping that no one had responded), you're right, my shared hosting was limiting the amount of threads I could make in comparison to my localhost. My diagnosing was incorrect. Many thanks again! — stackity, Dec 23 '12 at 12:40

Python cgi encoding / special characters

1 Answers1