2

I cannot get python CGI to print Hebrew characters to an html webpage on Linux. This is a script which demonstrates the problem:

#!/usr/bin/python3
print('Content-Type: text/html; charset=utf-8\n\n')
print ('<html><body>')
print ('first')
print ('second')
print ('תמות')
print ('third')
print ('</body></html>')

The file is saved in utf-8 (without BOM). I call this .cgi script directly from the browser address bar. The output is:

first second

While the Hebrew word and whatever follows are missing. No error shows in apache log or with cgitb enabled

I tested with apache 2.2 and python 3.2, on both Linux ubuntu 12.04 and centos 6, with Firefox, chrome and IE. And of course I can see Hebrew on any plain html page. On windows it works just fine.

Edit: while the final solution is indeed given by the linked question, this is still not a duplicate. See my comments below.

o17t H1H' S'k
  • 2,541
  • 5
  • 31
  • 52
  • 1
    When you do a "view source" what do you see? – Mark Ransom Nov 16 '12 at 23:06
  • the source shows: first second – o17t H1H' S'k Nov 16 '12 at 23:09
  • 2
    Try looking at http://stackoverflow.com/questions/3597480/how-to-make-python-3-print-utf8 – Mark Ransom Nov 16 '12 at 23:16
  • Or http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/ – Mark Ransom Nov 16 '12 at 23:24
  • sys.stdout.buffer.write doesnt seem to work at all for cgi even without hebrew. – o17t H1H' S'k Nov 16 '12 at 23:32
  • What do you get if you run it from the command line and redirect it into a file? Does the file contain what you expect? – David K. Hess Nov 16 '12 at 23:57
  • 1
    this is not a duplicate. and the referenced question does not solve my problem, as the buffered output does not work in this context. – o17t H1H' S'k Nov 17 '12 at 12:37
  • 1
    I agree. This is not a duplicate and the solution eyaler gave is better than in the linked question. – David K. Hess Nov 19 '12 at 22:24
  • 1
    Hey, 7 years later, I can confirm this is not a duplicate, as the the problem occurs only when trying to print the characters on the web server. Anyhow, @eyaler did you find a solution? – Yotam Dec 15 '18 at 14:43
  • 1
    @Yotam see my solution in the question – o17t H1H' S'k Dec 16 '18 at 15:07
  • my solution was deleted from the question. running: import sys; print (sys.stdout.encoding); gave me: "ANSI_X3.4-1968". finally this solved my problem: import sys, codecs; sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach()); this is another option: import sys, io; sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8') – o17t H1H' S'k Nov 13 '21 at 21:47
  • @HenryEcker once I found out that the issue was with the ANSI encoding, I could ask how to set the correct encoding. but this question was about why CGI was not printing Hebrew... and it is may be useful in this context for others not knowing the reason for this issue beforehand – o17t H1H' S'k Nov 13 '21 at 21:59
  • 1
    I understand. That said, the answer you included in your post matches those on the now linked thread. Your question now actually points to useful answers which _should_ work to fix the problem even if they do not directly explain _why_ they work in this _specific_ scenario. Duplicates should point to useful _answers_ even if the question is different. (Are you saying that the linked thread does not provide useful answers to this question?) – Henry Ecker Nov 13 '21 at 22:01
  • 2
    I want to clarify that I do not believe this question should be deleted. This is an awesome sign post for this very specific issue to a thread which provides reasonable answers. Which is _why_ questions are closed as duplicates. Which is why I made the change since the original duplicate closure was unhelpful. – Henry Ecker Nov 13 '21 at 22:04

1 Answers1

2

Looks like the default encoding for sys.stdout isn't necessarily UTF-8. If you want to use sys.stdout.buffer.write, try this:

sys.stdout.buffer.write('תמות'.encode('utf-8'))
David K. Hess
  • 16,632
  • 2
  • 49
  • 73