2

I'm using goslate for google translate API

I can translate Bengali to Engliash -

>>> import goslate

>>> gs = goslate.Goslate()
>>> S = gs.translate("ভাল", 'en')
>>> S

good

But, problem in arising when I want to translate English to Bengali.

>>> import goslate

>>> gs = goslate.Goslate()
>>> S = gs.translate("good", 'bn')
>>> S

Eoor:

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2:     character maps to <undefined>

What should I do?

print repr(S)
output: u'\u09ad\u09be\u09b2'

print("ভাল")
output: ভাল

print(u"ভাল") # this gives UnicodeEncodeError
Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
Shahriar
  • 13,460
  • 8
  • 78
  • 95

2 Answers2

1

This works for me

#coding: utf-8

from sys import setdefaultencoding, getdefaultencoding

d=getdefaultencoding()
if d != "utf-8":
    setdefaultencoding('utf-8')
st="ভাল"
f=open('test.txt','w')
f.write(st.encode('utf-8'))
f.close()
if d != "utf-8":
    setdefaultencoding(d)

This prints "ভাল" as expected. print st.encode('utf-8') works too.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • It should work actually. But, this gives me `ভাল`. What should I need to do? may be some changes I need in IDLE? – Shahriar Feb 17 '15 at 13:20
  • Even `#coding: ...` doesn't help?? – ForceBru Feb 17 '15 at 13:22
  • `#coding` this is a comment ? – Shahriar Feb 17 '15 at 13:22
  • Throw IDLE away and use console for output. That may help. – ForceBru Feb 17 '15 at 13:22
  • @AerofoilKite, `coding` is special directive. – ForceBru Feb 17 '15 at 13:23
  • I need to store this "ভাল" in text file. – Shahriar Feb 17 '15 at 13:23
  • it won't work if the environment uses character encoding different from utf-8 – jfs Feb 17 '15 at 13:24
  • @J.F.Sebastian, okay, I used `encode()`, this won't work too? @AerofoilKite, just `write()` it and that's all. – ForceBru Feb 17 '15 at 13:26
  • May be your technique is working on your machine. ? Something is wrong with my software. I'm using spyder – Shahriar Feb 17 '15 at 13:28
  • @AerofoilKite, try to execute this full script I gave using the command prompt (`cmd.exe` or Terminal on Mac OS) – ForceBru Feb 17 '15 at 13:30
  • the `.encode()` call leads to `UnicodeDecodeError` (do not encode bytes). `st` should be a Unicode string that you can print directly. – jfs Feb 17 '15 at 13:45
  • @J.F.Sebastian, _it doesn't_ on my system. `#coding: utf-8` gives opportunity to use any Unicode character in script. – ForceBru Feb 17 '15 at 13:53
  • the source code encoding (declared using `#coding: utf-8`) has nothing to do with the error. It is easy to demonstrate even with ascii source code: `b'\xe0\xa6\xad'.encode('utf-8')` – jfs Feb 17 '15 at 13:56
  • @J.F.Sebastian, this code _works on my system_. So it does _on the OP's one_. – ForceBru Feb 17 '15 at 13:57
  • I don't see any evidence of that. It won't work on a typical Python 2 installation where `sys.getdefaultencoding()` returns `'ascii'`. – jfs Feb 17 '15 at 13:59
  • it won't work (`ImportError`). You can't just call `sys.setdefaultencoding` and for a good reason. It hides bugs and may break 3rd party-libraries that don't expect non-default encoding. – jfs Feb 17 '15 at 14:10
  • @J.F.Sebastian, I can do it. Maybe you're using some old Python version. – ForceBru Feb 17 '15 at 14:13
  • @ForceBru: You can do it because your installation is non-standard (customized). Somewhere in your startup files you have `reload(sys)`. The standard installation has no `sys.setdefaultencoding()`. Here's [the place where it is removed](https://hg.python.org/cpython/file/2.7/Lib/site.py#l542) – jfs Feb 17 '15 at 14:15
  • @J.F.Sebastian, what are we talking about? Did this answer help the OP? Yes. No problem then. – ForceBru Feb 17 '15 at 14:20
0

It is definitely unrelated to goslate. Your issue is to make print u'\u09ad\u09be\u09b2' to work when the Unicode characters can't be represented using the console character encoding.

You either need to change the encoding to the one that can represent the Unicode characters such as utf-8 or use Unicode API such as WriteConsoleW assuming you are on Windows -- if you are not on Windows then just configure your environment to use utf-8.

WriteConsoleW usage is complicated though there is a simple to use win_unicode_console package on Python 3. The latter link also shows how to save the printed Unicode text to a file (print Unicode, set PYTHONIOENCODING).

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670