Convert an int value to unicode

Question

I am using pyserial and need to send some values less than 255. If I send the int itself the the ascii value of the int gets sent. So now I am converting the int into a unicode value and sending it through the serial port.

unichr(num_less_than_255);

However it raises this Exception:

'ascii' codec can't encode character u'\x9a' in position 24: ordinal not in range(128)

Whats the best way to convert an int to unicode?

Python2 or Python3? (guessing Python2, but makes quite the difference) Are you quite sure `unichr` is the call crashing? How are you doing the actual sending of the unichr returned data? — Joachim Isaksson, Jul 13 '13 at 07:11
`unichr()` does not exist in Python 3, so this is Python 2. `unichr()` is named `chr()` in Python 3 (conversion to a Unicode character). — Eric O. Lebigot, Jul 13 '13 at 11:37

chasmani · Answer 1 · 2016-01-20T22:10:31.937

33

In Python 2 - Turn it into a string first, then into unicode.

str(integer).decode("utf-8")

Best way I think. Works with any integer, plus still works if you put a string in as the input.

Updated edit due to a comment: For Python 2 and 3 - This works on both but a bit messy:

str(integer).encode("utf-8").decode("utf-8")

edited Jan 20 '16 at 22:10

answered Nov 12 '15 at 13:01

chasmani

2,362
2
23
35

5

`str(integer).encode("utf-8").decode("utf-8")`, while ugly, will work on Python 2 and 3, whereas the above will only work on Python 2. – Ivan X Jan 10 '16 at 13:25

Steve Barnes · Accepted Answer · 2013-07-13T07:36:55.633

25

Just use chr(somenumber) to get a 1 byte value of an int as long as it is less than 256. pySerial will then send it fine.

If you are looking at sending things over pySerial it is a very good idea to look at the struct module in the standard library it handles endian issues an packing issues as well as encoding for just about every data type that you are likely to need that is 1 byte or over.

edited Jul 13 '13 at 07:36

answered Jul 13 '13 at 07:25

Steve Barnes

27,618
6
63
73

@user2578666: If a response is useful to you and you mark it as accepted, it is only fair to up-vote it too. Welcome to StackOverflow! – Eric O. Lebigot Jul 13 '13 at 11:35
3

No rep yet.Got to earn it:-) – user2578666 Jul 13 '13 at 12:37
@user2578666: I see—I did not remember this rule. May you reputation grow fast. :) – Eric O. Lebigot Jul 14 '13 at 03:40
chr(32) is returning ' ' empty space and other numbers are working fine. How to overcome value 32? – Venu May 26 '16 at 13:50
7

`chr(32)` is also `0x20` which is the space character - what do you expect to see? – Steve Barnes May 26 '16 at 13:58

score 12 · Answer 3 · edited May 23 '17 at 12:03

I think that the best solution is to be explicit and say that you want to represent a number as a byte (and not as a character):

>>> import struct
>>> struct.pack('B', 128)
>>> '\x80'

This makes your code work in both Python 2 and Python 3 (in Python 3, the result is, as it should, a bytes object). An alternative, in Python 3, would be to use the new bytes([128]) to create a single byte of value 128.

I am not a big fan of the chr() solutions: in Python 3, they produce a (character, not byte) string that needs to be encoded before sending it anywhere (file, socket, terminal,…)—chr() in Python 3 is equivalent to the problematic Python 2 unichr() of the question. The struct solution has the advantage of correctly producing a byte whatever the version of Python. If you want to send data over the serial port with chr(), you need to have control over the encoding that must take place subsequently. The code might work when the default encoding used by Python 3 is UTF-8 (which I think is the case), but this is due to the fact that Unicode characters of code point smaller than 256 can be coded as a single byte in UTF-8. This adds an unnecessary layer of subtlety and complexity that I do not recommend (it makes the code harder to understand and, if necessary, debug).

So, I strongly suggest that you use the approach above (which was also hinted at by Steve Barnes and Martijn Pieters): it makes it clear that you want to produce a byte (and not characters). It will not give you any surprise even if you run your code with Python 3, and it makes your intent clearer and more obvious.

Bravo @EOL - one of the, (possibly many), misleading things about C that C++ inherited is the lack of any distinction between a string that has a length of 1, a single character - both text n the local encoding - and a byte. — Steve Barnes, Jul 13 '13 at 19:02

Martijn Pieters · Answer 4 · 2013-07-13T07:31:19.687

10

Use the chr() function instead; you are sending a value of less than 256 but more than 128, but are creating a Unicode character.

The unicode character has to then be encoded first to get a byte character, and that encoding fails because you are using a value outside the ASCII range (0-127):

>>> str(unichr(169))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128)

This is normal Python 2 behaviour; when trying to convert a unicode string to a byte string, an implicit encoding has to take place and the default encoding is ASCII.

If you were to use chr() instead, you create a byte string of one character and that implicit encoding does not have to take place:

>>> str(chr(169))
'\xa9'

Another method you may want to look into is the struct module, especially if you need to send integer values greater than 255:

>>> struct.pack('!H', 1000)
'\x03\xe8'

The above example packs an integer into a unsigned short in network byte order, for example.

edited Jul 13 '13 at 07:31

answered Jul 13 '13 at 07:25

Martijn Pieters

1,048,767
296
4,058
3,343

I guess you meant "bytes", not "a byte character"? "byte character" is not a common expression, and is almost an [oxymoron](http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file/4546129#4546129). Furthermore, the default encoding does not have to be ASCII: it is officially `sys.getdefaultencoding()`. – Eric O. Lebigot Jul 13 '13 at 07:42
@EOL: This is Python 2, this is a string object, which is really a sequence of bytes. But looping over it gives you strings of length 1; byte characters. – Martijn Pieters Jul 13 '13 at 07:43
@EOL: The default encoding **is** ASCII on Python 2, when it comes to implicit encodings (concatenating strings and unicode, comparing for equality, etc.). – Martijn Pieters Jul 13 '13 at 07:43
@EOL: Do not confuse that with the `print` statement / function encoding to the codec of `sys.stdout`. – Martijn Pieters Jul 13 '13 at 07:45
Do you have a reference that states this? I have always been looking for it. – Eric O. Lebigot Jul 13 '13 at 07:45
@EOL: See the [Unicode HOWTO](http://docs.python.org/2/howto/unicode.html#the-unicode-type): *if you leave off the `encoding` argument, the ASCII encoding is used for the conversion* – Martijn Pieters Jul 13 '13 at 07:46
The reference you give is about the `unicode()` function, not about how Unicode strings are encoded by default. You are saying ASCII, I understand the encoding used is `sys.getdefaultencoding()`. I may misunderstand the documentation, but I still can't find anything more explicit. – Eric O. Lebigot Jul 13 '13 at 07:58
@EOL: But you could have tried my example yourself in a Python prompt. `sys.getdefaultencoding()` is the codec that is used for encoding `print` output. Implicit conversions between unicode and string use `'ASCII'`. – Martijn Pieters Jul 13 '13 at 07:59
I'm ready to believe you about these two points. :) But is this in the documentation? – Eric O. Lebigot Jul 13 '13 at 08:00
I'll find you a reference later, out of time. But the same rules apply for all string conversions without explicit codec; Unicode to byte string and vice versa. – Martijn Pieters Jul 13 '13 at 08:04
I actually tried it: Python 2 does *not* generally use ASCII for concatenating strings and unicode strings. You can try `u"" + "é"` [with and without](http://stackoverflow.com/questions/3828723/why-we-need-sys-setdefaultencodingutf-8-in-a-py-script) `sys.setdefaultencoding('UTF8')`: it works if UTF-8 is used, and shows that concatenation *does* use `sys.getdefaultencoding()`. It is also used for printing to `sys.stdout` with a `None` encoding (which happens when the standard output is redirected to a file, in Python 2). – Eric O. Lebigot Jul 13 '13 at 08:09
If you set a different encoding you replaced the default. Note that `sys.setdefaultencoding()` is **removed** from `sys` *for a reason* and requires you to reload `sys` to get access to. – Martijn Pieters Jul 13 '13 at 09:25
Indeed (this was in the link I gave). I see where my confusion came from: with "the default encoding is ASCII", you meant that `sys.getdefaultencoding()` is by default ASCII, whereas I understood that the `str()` conversion uses ASCII. My bad, sorry. – Eric O. Lebigot Jul 13 '13 at 11:28
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/33414/discussion-between-eol-and-martijn-pieters) – Eric O. Lebigot Jul 13 '13 at 11:28

Convert an int value to unicode

4 Answers4

Linked

Related