0

How do i store the content "আপনার" as UTF-8 "আপনার"? I have tried the following:

>>> content = "আপনার"
>>> content
'\xe0\xa6\x86\xe0\xa6\xaa\xe0\xa6\xa8\xe0\xa6\xbe\xe0\xa6\xb0'

>>> content = "আপনার".encode("UTF-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

>>> content = "আপনার".decode("UTF-8")
>>> content
u'\u0986\u09aa\u09a8\u09be\u09b0'
Rakib
  • 12,376
  • 16
  • 77
  • 113

1 Answers1

2

The second one works, but you have to use print content instead of content:

>>> content = "আপনার".decode("UTF-8")
>>> print content
আপনার

__str__ and __repr__

This is the difference between a str and __repr__ formats of an object. The first is meant to be human-readable, the second is meant to expose internals and be unique to the object. You can read more in Difference between __str__ and __repr__ in Python.

String representation

>>> print unicode(content)
আপনার

__repr__ representation

>>> print content.__repr__()
u'\u0986\u09aa\u09a8\u09be\u09b0'
Community
  • 1
  • 1
Adam Matan
  • 128,757
  • 147
  • 397
  • 562
  • so if i were to store this content in mongoDB and were to expose it for 3rd party apps via a REST API, will it show inconsistency? – Rakib May 16 '15 at 12:24
  • Be consistent with with the representation you send and receive from and to mongo, and you'll be fine. I think that sending a `unicode` object - which you have in hand - is sufficient. – Adam Matan May 16 '15 at 12:33