Usage of unicode() and encode() functions in Python

Question

I have a problem with encoding of the path variable and inserting it to the SQLite database. I tried to solve it with encode("utf-8") function which didn't help. Then I used unicode() function which gives me type unicode.

print type(path)                  # <type 'unicode'>
path = path.replace("one", "two") # <type 'str'>
path = path.encode("utf-8")       # <type 'str'> strange
path = unicode(path)              # <type 'unicode'>

Finally I gained unicode type, but I still have the same error which was present when the type of the path variable was str

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

Could you help me solve this error and explain the correct usage of encode("utf-8") and unicode() functions? I'm often fighting with it.

This execute() statement raised the error:

cur.execute("update docs set path = :fullFilePath where path = :path", locals())

I forgot to change the encoding of fullFilePath variable which suffers with the same problem, but I'm quite confused now. Should I use only unicode() or encode("utf-8") or both?

I can't use

fullFilePath = unicode(fullFilePath.encode("utf-8"))

because it raises this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 32: ordinal not in range(128)

Python version is 2.7.2

Your exact question has already been answered: [http://stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data][1] [1]: http://stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data — garnertb, Apr 23 '12 at 20:51
Learning how Python 3 [handles](http://docs.python.org/dev/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit) text and data has really helped me understand everything. It is then easy to apply the knowledge to Python 2. — Oleh Prypin, Apr 23 '12 at 21:04
here is the slides of a great talk about unicode in python -- [link](http://farmdev.com/talks/unicode/) — bachr, Aug 23 '14 at 13:05

newtover · Answer 1 · 2020-11-18T15:32:16.100

134

str is text representation in bytes, unicode is text representation in characters.

You decode text from bytes to unicode and encode a unicode into bytes with some encoding.

That is:

>>> 'abc'.decode('utf-8')  # str to unicode
u'abc'
>>> u'abc'.encode('utf-8') # unicode to str
'abc'

UPD Sep 2020: The answer was written when Python 2 was mostly used. In Python 3, str was renamed to bytes, and unicode was renamed to str.

>>> b'abc'.decode('utf-8') # bytes to str
'abc'
>>> 'abc'.encode('utf-8'). # str to bytes
b'abc'

edited Nov 18 '20 at 15:32

answered Apr 23 '12 at 21:08

newtover

31,286
11
84
89

1

Very good answer, straight to the point. I'd add that `unicode` speaks about letters or symbols, or more generically: **runes** while `str` represents a bytes string in a certain encoding, that you must `decode` (obviously in the correct encoding) to get the specific runes – arainone Aug 28 '17 at 14:44
3

Python 3.8 >> `'str' object has no attribute 'decode'` – Yohan Obadia Nov 15 '20 at 10:30
do you have documentation for change unicode to str? I cant find – cikatomo Nov 21 '20 at 17:04
1

@cikatomo It's one one of the key changes in Python 3: https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit – newtover Nov 21 '20 at 23:22

Andrew Clark · Accepted Answer · 2012-04-24T16:07:42.310

88

You are using encode("utf-8") incorrectly. Python byte strings (str type) have an encoding, Unicode does not. You can convert a Unicode string to a Python byte string using uni.encode(encoding), and you can convert a byte string to a Unicode string using s.decode(encoding) (or equivalently, unicode(s, encoding)).

If fullFilePath and path are currently a str type, you should figure out how they are encoded. For example, if the current encoding is utf-8, you would use:

path = path.decode('utf-8')
fullFilePath = fullFilePath.decode('utf-8')

If this doesn't fix it, the actual issue may be that you are not using a Unicode string in your execute() call, try changing it to the following:

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

edited Apr 24 '12 at 16:07

answered Apr 23 '12 at 21:15

Andrew Clark

202,379
35
273
306

This statement `fullFilePath = fullFilePath.decode("utf-8")` still raises error `UnicodeEncodeError: 'ascii' codec can't encode characters in position 32-34: ordinal not in range(128)`. fullFilePath is a combination of type *str* and string taken from *text* column of db table which should be utf-8 encoding. – xralf Apr 23 '12 at 21:25
According to [this](http://www.sqlite.org/datatype3.html) but it can be UTF-8, UTF-16BE or UTF-16LE. Can I find out it somehow? – xralf Apr 23 '12 at 21:31
@xralf, If you are combining different `str` objects you may be mixing encodings. Can you show the result of `print repr(fullFilePath)`? – Andrew Clark Apr 23 '12 at 21:34
I can show it only before the call of *decode()*. The problematic characters are \u0161 and \u0165. – xralf Apr 23 '12 at 21:46
@xralf - So it is already unicode? Try changing the execute call to unicode: `cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())` – Andrew Clark Apr 23 '12 at 21:51

score 1 · Answer 3 · answered Sep 26 '17 at 11:56

1

Make sure you've set your locale settings right before running the script from the shell, e.g.

$ locale -a | grep "^en_.\+UTF-8"
en_GB.UTF-8
en_US.UTF-8
$ export LC_ALL=en_GB.UTF-8
$ export LANG=en_GB.UTF-8

Docs: man locale, man setlocale.

answered Sep 26 '17 at 11:56

kenorb

155,785
88
678
743

Usage of unicode() and encode() functions in Python

3 Answers3

Linked