15

I'm digging around with python and networking.

while True:
   data = sock.recv(10240)

This is definitely listening. But it seems to need to be converted to a text string.

I've seen some people using struct.unpack(), but I'm not sure exactly how it works. What's the way to convert?

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
coffeemonitor
  • 12,780
  • 34
  • 99
  • 149
  • 1
    What version of python are you using? The answer will be different for 2.x versus 3.x. – Joshua D. Boyd Dec 20 '12 at 20:16
  • Version 3.3.0 As I understand it, the 2.x is different than 3.x in certain networking functionalities – coffeemonitor Dec 20 '12 at 20:18
  • @coffeemonitor: It's not all that different in networking functionalities—but it's pretty different in text-handling functionalities, which is why Joshua D. Boyd asked that question. – abarnert Dec 20 '12 at 20:23

2 Answers2

31

What you get back from recv is a bytes string:

Receive data from the socket. The return value is a bytes object representing the data received.

In Python 3.x, to convert a bytes string into a Unicode text str string, you have to know what character set the string is encoded with, so you can call decode. For example, if it's UTF-8:

stringdata = data.decode('utf-8')

(In Python 2.x, bytes is the same thing as str, so you've already got a string. But if you want to get a Unicode text unicode string, it's the same as in 3.x.)

The reason people often use struct is that the data isn't just 8-bit or Unicode text, but some other format. For example, you might send each message as a "netstring": a length (as a string of ASCII digits) followed by a : separator, then length bytes of UTF-8, then a ,—such as b"3:Abc,". (There are variants on the format, but this is the Bernstein standard netstring.)

The reason people use netstrings, or other similar techniques, is that you need some way to delimit messages when you're using TCP. Each recv could give you half of what the other side passed with send, or it could give your 3 sends and part of the 4th. So, you have to accumulate a buffer of recv data, and then pull the messages out of it. And you need some way to tell when one message ends and the next begins. If you're just sending plain text messages without any newlines, you can just use newlines as a delimiter. Otherwise, you'll have to come up with something else—maybe netstrings, or using \0 as a delimiter, or using newlines as a delimiter but escaping actual newlines within the data, or using some self-delimited structured format like JSON.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • And it works! The recv just needed that conversion. I assume if I'm to send data back to it's source, I'll have to encode it? – coffeemonitor Dec 20 '12 at 20:41
  • @coffeemonitor: Exactly, if you've got a string, encode it and `send` the results. – abarnert Dec 20 '12 at 21:19
  • @abarnert could you mind sharing exactly how to determine if a message ends with half a codepoint (the other half being in the next message). For example, if you are reading from a socket, and you know it will be utf-8, how can you know when to use .decode() on the bytes when you don't know if the last byte is a valid utf-8 codepoint.. – dylnmc Sep 15 '16 at 03:05
5

In Python 2.7.x and before, data is already a string. In Python 3.x, data is a bytes object. TO convert bytes to string, use the decode() method. decode() will require a codec argument, like 'utf-8'.

Joshua D. Boyd
  • 4,808
  • 3
  • 29
  • 44