6

For HTML5 and Python CGI:

If I write UTF-8 Meta Tag, my code doesn't work. If I don't write, it works.

Page encoding is UTF-8.

print("Content-type:text/html")
print()
print("""
    <!doctype html>
    <html>
    <head>
        <meta charset="UTF-8">
    </head>
    <body>
        şöğıçü
    </body>
    </html>
""")

This codes doesn't work.

print("Content-type:text/html")
    print()
    print("""
        <!doctype html>
        <html>
        <head></head>
        <body>
            şöğıçü
        </body>
        </html>
    """)

But this codes works.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user1898723
  • 251
  • 1
  • 3
  • 5
  • 1
    Did you specify a source encoding? What encoding was your file saved in by your editor? Python sends the string you typed *literally* so if you saved this file in Latin-1 encoding, that's what will be sent. – Martijn Pieters Feb 13 '13 at 18:07
  • File encoding :utf-8. Normaly, it works. My previous projects worked it. I use Python3.3. Its default encoding is utf8. by the way, i can't speak english. I don't understand sometimes – user1898723 Feb 13 '13 at 18:23
  • Aha, that's important information! You need to explicitly encode in that case, really. – Martijn Pieters Feb 13 '13 at 18:25
  • if you will tell me where is problem, I can. Server, browser, editor, html, etc ? – user1898723 Feb 13 '13 at 18:32
  • 3
    Voting to re-open this. Python 3 CGI printing is a *common pain point*, and far from a localised problem. – Martijn Pieters Feb 14 '15 at 14:49
  • Seconded. Problems often occur if the webserver does not specify a locale to use for Python 3, and Python 3 assumes C and ASCII or legacy charset; and if the locale is not an UTF-8 locale... – Antti Haapala -- Слава Україні Feb 14 '15 at 14:58

2 Answers2

10

For CGI, using print() requires that the correct codec has been set up for output. print() writes to sys.stdout and sys.stdout has been opened with a specific encoding and how that is determined is platform dependent and can differ based on how the script is run. Running your script as a CGI script means you pretty much do not know what encoding will be used.

In your case, the web server has set the locale for text output to a fixed encoding other than UTF-8. Python uses that locale setting to produce output in in that encoding, and without the <meta> header your browser correctly guesses that encoding (or the server has communicated it in the Content-Type header), but with the <meta> header you are telling it to use a different encoding, one that is incorrect for the data produced.

You can write directly to sys.stdout.buffer, after explicitly encoding to UTF-8. Make a helper function to make this easier:

import sys

def enc_print(string='', encoding='utf8'):
    sys.stdout.buffer.write(string.encode(encoding) + b'\n')

enc_print("Content-type:text/html")
enc_print()
enc_print("""
    <!doctype html>
    <html>
    <head>
        <meta charset="UTF-8">
    </head>
    <body>
        şöğıçü
    </body>
    </html>
""")

Another approach is to replace sys.stdout with a new io.TextIOWrapper() object that uses the codec you need:

import sys
import io

def set_output_encoding(codec, errors='strict'):
    sys.stdout = io.TextIOWrapper(
        sys.stdout.detach(), errors=errors,
        line_buffering=sys.stdout.line_buffering)

set_output_encoding('utf8')

print("Content-type:text/html")
print()
print("""
    <!doctype html>
    <html>
    <head></head>
    <body>
        şöğıçü
    </body>
    </html>
""")
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • This is really nice, but from what I see in the doc, this could also be implementation dependant! `This is not part of the TextIOBase API and may not exist in some implementations` (from http://docs.python.org/3/library/io.html ) – Zenon Feb 13 '13 at 23:43
  • @Zenon: that may apply to certain implementations indeed (specifically `StringIO`), but the `stdout` stream definitely has a `.buffer` attribute; that is documented I the [`sys.stdout` documentation](http://docs.python.org/3/library/sys.html#sys.stdout). – Martijn Pieters Feb 13 '13 at 23:56
  • 1
    not work in python3.4 , `Internal Server Error` happened – alireza Jan 06 '15 at 19:39
  • 1
    @alireza.m: the code, as posted here, works just fine on Python 3.4; I retested it just now. You have a different problem I'm afraid. Use `import cgitb; cgitb.enable()` (see the [module documenation](https://docs.python.org/3/library/cgitb.html)) to get more meaningful errors. – Martijn Pieters Jan 06 '15 at 19:58
  • worked perfectly !! Thanks. Solved big problem of my code ! Thanks alot. is this the right way ? or use print() ? – alireza Jan 07 '15 at 05:41
  • 1
    @alireza.m this is the correct way; `print()` can easily fail if the CGI server doesn't provide a correct encoding for Python (it never does). – Martijn Pieters Jan 07 '15 at 08:00
10

From https://ru.stackoverflow.com/a/352838/11350

First dont forget to set encoding in file

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Then try

import sys
import codecs

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Or if you use apache2, add to your conf.

AddDefaultCharset UTF-8    
SetEnv PYTHONIOENCODING utf8
Community
  • 1
  • 1
  • 1
    Works when all else doesn't. But is it not absurd that we have to go this kind of arcane nonsense to do something so basic? – havlock Nov 25 '17 at 17:18
  • Thank you for what seems like the simplest and best answer. Converting a CGI script from Python 2 to Python 3 is such a hassle! –  Jan 02 '19 at 19:49