1

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send. The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2. The example is as follow:

In python2:

data = 'AA060100B155'
datasplit = [fulldata[i: i+2] for i in range(0, len(fulldata), 2)]
senddata = ''
for item in datasplit:
    itemdec = chr(int(item, 16))
    senddata += itemdec
print(senddata) 
#'\xaa\x06\x01\x00\xb1U',which is the data I need

In python3, seems it can only sends the encoded bytes using "senddata.encode()", but it is not the format I want. You can try:

print(senddata.encode('latin-1'))
#b'\xaa\x06\x01\x01\xb2U'

to see the difference of two senddatas, and an interesting thing is that it is faulty encoded when using utf-8.

The data stored in Python3 str type is the thing I need, but my question is how to send the data of that string without encoding it? Or how to perform the same str type of Python2 in Python3?

Can anyone help me with this?

lwangreen
  • 136
  • 2
  • 12

4 Answers4

2

You can convert the whole string to an integer, then use the integer method to_bytes to convert it into a bytes object:

fulldata = 'AA060100B155'

senddata = int(fulldata, 16).to_bytes(len(fulldata)//2, byteorder='big')
print(senddata)

# b'\xaa\x06\x01\x00\xb1U'

The first parameter of to_bytes is the number of bytes, the second (required) is the byteorder. See int.to_bytes in the official documentation for reference.

Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
2

There are various ways to do this. Here's one that works in both Python 2 and Python 3.

from binascii import unhexlify

fulldata = 'AA060100B155'
senddata = unhexlify(fulldata)
print(repr(senddata))

Python 2 output

'\xaa\x06\x01\x00\xb1U'

Python 3 output

b'\xaa\x06\x01\x00\xb1U'
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • It is not the matter about what I send, it is all about what the other side can recognize. do you know what's the difference between '\xaa\x06\x01\x00\xb1U' and a 'b' in front of it? – lwangreen Apr 17 '17 at 12:51
  • @lwangreen In Python 2, there's no difference. In Python 3, `b'\xaa\x06\x01\x00\xb1U'` is a bytes string, contains _exactly_ the same bytes as Python 2's `b'\xaa\x06\x01\x00\xb1U'` or `'\xaa\x06\x01\x00\xb1U'`. However, `'\xaa\x06\x01\x00\xb1U'` in Python 3 is the same as `u'\xaa\x06\x01\x00\xb1U'` (in either Python 2 or Python 3). And you can convert that to the previous bytes string using `u'\xaa\x06\x01\x00\xb1U'.encode('latin-1')`. That's because Latin-1 is a subset of Unicode. – PM 2Ring Apr 17 '17 at 13:07
  • 1
    @lwangreen With `fulldata = 'AA060100B155'` your Python 2 code sends `'\xaa\x06\x01\x00\xb1U'`. So if your Python 3 code sends the bytes string `b'\xaa\x06\x01\x00\xb1U'` over the socket they will get _exactly_ the same bytes. – PM 2Ring Apr 17 '17 at 13:19
2

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send.

You have to make sure that whatever you send is decodable by the other side. The first step you need to take is to know what sort of encoding that network/file/socket is using. If you use UTF-8 for instance to send your encoded data and the client has ASCII encoding, this will work. But, say cp500 is the encoding scheme of your client and you send the encoded string as UTF-8, this won't work. It's better to pass the name of your desired encoding explicitly to functions, because sometimes the default encoding of your platform may not necessarily be UTF-8. You can always check the default encoding by this call sys.getdefaultencoding().

The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2

Yes, Python 3.X uses UTF-8 as the default encoding, but this is not guaranteed in some cases the default encoding could be changed, it's better to pass the name of the desired encoding explicitly to avoid such cases. Notice though, str in Python 3.X is the equivalent of unicode + str in 2.X, but str in 2.X supports only 8-bit (1-byte) (0-255) characters.

On one hand, your problem seems with 3.X and its type distinction between str and bytes strings. For APIs that expect bytes won't accept str in 3.X as of today. This is unlike 2.X, where you can mix unicode and str freely. This distinction in 3.X makes sense, given str represents decoded strings and used for textual data. Whereas, bytes represents encoded strings as raw bytes with absolute byte values.

On the other hand, you have problem with choosing the right encoding for your text in 3.X that you need to pass to client. First check what sort of encoding does your client use. Second, pass the encoded string with the the proper encoding scheme of your client so your client can decode it properly: str.encode('same-encoding-as-client').

Because you pass your data as str in 2.X and it works, I suspect and it's most likely your client uses 8-bit encoding for characters, something like Latin-1 might be the encoding used by your client.

GIZ
  • 4,409
  • 1
  • 24
  • 43
  • Thanks for your long explaination! I will have a talk with the client. – lwangreen Apr 17 '17 at 12:59
  • @Iwangreen Also see: [Unicode HOWTO](https://docs.python.org/3.3/howto/unicode.html). – GIZ Apr 17 '17 at 13:07
  • Ok I have one question. Is there a default encoding scheme for Python2 str? From your answer I thinks it is a no for this. We actually perform a hardware control through Python application, and there might be also no encoding scheme on the circuit board. That's probably the reason why I can do communication with Python2 str but not for Python3. Your opinion? – lwangreen Apr 18 '17 at 00:24
  • There's no encoding for `str` in Python 2.X and is simply a raw bytes string: [What encoding do normal python strings use?](https://stackoverflow.com/questions/3547534/what-encoding-do-normal-python-strings-use). – GIZ Apr 18 '17 at 07:54
  • Your second question: the reason why your code works in 2.X when you send `str` is because `str` is raw data. But I don't see the reason why you're not able to make your data raw and send them as `bytes` objects in 3.X. Though, `bytes` does exist in 2.X for forward-compatibility and returns simple `str`. Interfacing with hardware would likely require raw data of course. Isn't `str` in 2.X raw after all? So if you would like to to have 2.X's `str` in 3.X, it's just called `bytes` with minor differences. – GIZ Apr 18 '17 at 08:00
0

The following is Python 2/3 compatible. The unhexlify function converts hexadecimal notation to bytes. Use a byte string and you don't have to deal with Unicode strings. Python 2 is byte strings by default, but recognizes the b'' syntax that Python 3 requires to use a byte string.

from binascii import unhexlify
fulldata = b'AA060100B155'
print(repr(unhexlify(fulldata)))

Python 2 output:

'\xaa\x06\x01\x00\xb1U'

Python 3 output:

b'\xaa\x06\x01\x00\xb1U'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251