22

I simplified my code for better understanding. here is the problem :

case 1:

# -*- coding: utf-8 -*-

text = "چرا کار نمیکنی؟" # also using u"...." results the same
print(text)

output:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

case 2:

text = "چرا کار نمیکنی؟".encode("utf-8") 
print(text)

there is no output.

case 3:

import sys

text = "چرا کار نمیکنی؟".encode("utf-8")
sys.stdout.buffer.write(text)

output:

چرا کار نمیکنی؟

I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , ....

I also read the documentation of python 3 regarding to Unicode here.

and also read dozens of Q&A in stackoverflow.

and here is a long article explaining the problem and answer for python 2.X

the simple question is:

how to print non-ASCII characters like Farsi or Arabic using python print() function?

update 1 : as it is suggested from many guys that the problem is concerned with the terminal I tested the case :

case 4 :

text = "چرا کار نمیکنی؟" .encode("utf-8")# also using u"...." results the same
print(text)

terminal :

python persian_encoding.py > test.txt

test.txt :

b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'

very important update:

after a while playing around with this issue, finally I found another workaround to make cmd.exe do the job (without needing third party softwares like ConEmu or ...):

a little explanation first:

our main problem does not concern Python. it's a problem with the Command Prompt character set in Windows(for complete explanation check out Arman's Answer) so ... if you change the character set of Windows Command Prompt to UTF-8 instead of default ascii , then the Command Prompt will be able to interact with UTF-8 characters(like Farsi or Arabic) this solution does not guarantee good representation of characters(as they will be printed out like little squares), but it's a good solution if you want to have file I/O in python with UTF-8 characters.

Steps:

before starting python from command line , type:

chcp 65001

now run your python code as always.

python testcode.py

result in case 1:

?????? ??? ??????

it runs without errors.

screenshot:

enter image description here

for more information about how to set 65001 as the default character set check this out.

Community
  • 1
  • 1
Soorena
  • 4,352
  • 5
  • 30
  • 42
  • 4
    What's your platform and editor/terminal? This is not a problem with python it's more likely your output encoding that cant encode the Persian text. – Mazdak Sep 16 '16 at 09:54
  • هالو جهان It works fine in mine : `>>> text = "چرا کار نمیکنی؟"` `>>> print(text)` `چرا کار نمیکنی؟` the only difficulty beeing the reverse of characters reading order during a copy paste. – Flint Sep 16 '16 at 10:14
  • @Kasramvd python 3 , windows 10 , intellij 9 – Soorena Sep 16 '16 at 10:20
  • @Soorena So search for changing the output encoding in windows 10 , intellij 9. – Mazdak Sep 16 '16 at 10:22
  • @Kasramvd I've searched for this but no matter what terminal or IDE used , the result is the same , if you read the BPL answer you will see that no matter which terminal or platform you use , the print() function or write() to file function don't work. – Soorena Sep 16 '16 at 10:26
  • the problem is everywhere , Farsi file-name , Farsi text files , writing to files for Farsi characters , anything. – Soorena Sep 16 '16 at 10:29
  • @Soorena I've edited my question and provided you a solution for a specific terminal, conemu – BPL Sep 16 '16 at 10:44
  • @Soorena No it's not. The obvious thing is that python can simply handle the unicode, specially in python 3 that string is unicode by default. Your problem is demonstration and that all matter of output encoding. I have a sublime in linux and can simply see the unicode in output, even in linux terminal. There is no special thing about the Unicode except the different ways of demonstration. And this is the consul job which will show a proper result based on your encoding. – Mazdak Sep 16 '16 at 10:45
  • @Kasramvd you were absolutely right, thank you. I updated my question and explained the problem and the solution with Windows Console. – Soorena Sep 18 '16 at 13:28

4 Answers4

15

Your code is correct as it works on my computer with both Python 2 and 3 (I'm on OS X):

~$ python -c 'print "تست"'
تست
~$ python3 -c 'print("تست")'
تست

The problem is with your terminal that can not output unicode characters. You could verify it by redirecting your output to a file like python3 my_file.py > test.txt and open the file using an editor.

If you are on Windows you could use a terminal like Console2 or ConEmu that renders unicode better than Windows prompt.

You may encounter errors with these terminals too because of wrong code-pages/encodings of Windows. There is a small python package that fixes them (sets them correctly):

1- Install this pip install win-unicode-console

2- Put this at the top of your python file:

try:
    # Fix UTF8 output issues on Windows console.
    # Does nothing if package is not installed
    from win_unicode_console import enable
    enable()
except ImportError:
    pass

If you got errors when redirecting to a file, you may fix it by settings io encoding:

On Windows command line:

SET PYTHONIOENCODING=utf-8

On Linux/OS X terminal:

export PYTHONIOENCODING=utf-8

Some points

  • There is no need to use u"aaa" syntax in python 3. Strings literals are unicode by default.
  • Default coding of files is UTF8 in python 3 so coding declaration comment (e.g. # -*- coding: utf-8 -*-) is not needed.
Arman Ordookhani
  • 6,031
  • 28
  • 41
  • @Arman Ordookhani: Just for the record, win-unicode-console worked for me on conemu (case1) but it won't on command prompt. Also had problems installing on py2.x – BPL Sep 16 '16 at 10:52
  • @BPL I think command prompt does not have the ability to show Persian characters (only support a very small subset of unicode) or maybe there is some config of cmd.exe that I'm not aware of. – Arman Ordookhani Sep 16 '16 at 11:17
  • doesn't work when i try to write it in file – Jawad Jan 19 '22 at 00:07
6

The output will depend basically on which platform&terminal you run your code. Let's examine the below snippet for different windows terminals running either with 2.x or 3.x:

# -*- coding: utf-8 -*-
import sys

def case1(text):
    print(text)

def case2(text):
    print(text.encode("utf-8"))

def case3(text):
    sys.stdout.buffer.write(text.encode("utf-8"))

if __name__ == "__main__":
    text = "چرا کار نمیکنی؟"

    for case in [case1, case2, case3]:
        try:
            print("Running {0}".format(case.__name__))
            case(text)
        except Exception as e:
            print(e)

        print('-'*80)

Results

Python 2.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ
    --------------------------------------------------------------------------------
    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------
    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ
    --------------------------------------------------------------------------------

    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------

    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Python 3.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ--------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <unde
    fined>
    --------------------------------------------------------------------------------

    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda
    \xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------

    Running case3
    ┌åÏ▒Ϻ ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îσ----------------------------------------------------
    ----------------------------

As you can see just using sublime text3 terminal (case3) worked alright. The other terminals didn't support persian. The main point here is, it depends which terminal & platform you're using.

Solution (ConEmu specific)

Modern terminals like ConEmu allows you to work with UTF8-Encoding as explained here, so, let's try:

chcp 65001 & cmd

And then running again the script against 2.x & 3.x:

Python2.x

Running case1
��را کار نمیکنی؟[Errno 0] Error
--------------------------------------------------------------------------------
Running case2
'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
--------------------------------------------------------------------------------
Running case3
'file' object has no attribute 'buffer'
--------------------------------------------------------------------------------

Python3.x

Running case1
چرا کار نمیکنی؟
--------------------------------------------------------------------------------
Running case2
b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
--------------------------------------------------------------------------------
Running case3
چرا کار نمیکنی؟--------------------------------------------------------------------------------

As you can see, now the output was succesfull with python3 case1 (print). So... moral of a fable... learn more about your tools and how to configure them properly for your use-cases ;-)

BPL
  • 9,632
  • 9
  • 59
  • 117
  • I wrote in the question : I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , .... for example consider that I want to write these characters to a text file, but again it's output will be byte characters. no matter which text editor used for openning the file. – Soorena Sep 16 '16 at 10:16
  • regarding this , your answer will not help me. but I really appreciate your efforts and time spent on my question. – Soorena Sep 16 '16 at 10:17
  • 1
    @Soorena I don't know why it's not helping you, I just want you to understand which the output depends on the terminal you're using. It wouldn't be a problem if you were using a multiplatform gui application written on Qt, about file printing, same stuff, I'll edit my question very soon regarding that. Which platform/terminal/editor you're using? – BPL Sep 16 '16 at 10:19
  • Windows 10 64bit , Intellij Idead 9 , Python 3.5.2 – Soorena Sep 16 '16 at 10:24
  • 1
    @Soorena I've edited my question again, hope it helps, I've given the solution for [conemu](https://conemu.github.io/) . I've used cygwin, msysgit, vs_cmd prompt and I don't like any of them... conemu is a great choice which reminds me to the powerful unix terminals (not as powerful), I recommend it to you ;) – BPL Sep 16 '16 at 10:39
  • thanks , I used Console2 instead of ConEmu , both of them work fine. – Soorena Sep 16 '16 at 11:17
  • the point about ConEmu console was a great help, and it's a very time-saving console trick. thanks again. – Soorena Sep 16 '16 at 21:56
  • tnx a lot. this method worked for me: sys.stdout.buffer.write(text.encode("utf-8")) – ali reza Jun 08 '22 at 08:38
1

I can't reproduce the problem. Here is my script p.py:

text = "چرا کار نمیکنی؟"
print(text)

And the result of python3 p.py:

چرا کار نمیکنی؟

Are you sure you're using python 3 ? With python2 p.py:

SyntaxError: Non-ASCII character '\xda' in file p.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
aluriak
  • 5,559
  • 2
  • 26
  • 39
0

And if you do the text.encode("utf-8")-part, it will show as b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f' (at my machine).

EDIT Sorry for the edit, but I can't comment (because not enough reputation)

Even on python 2.7, the print(text) does work. Check out this link here, which I just generated.

Maettel
  • 46
  • 6