76

When I try to receive larger amounts of data it gets cut off and I have to press enter to get the rest of the data. At first I was able to increase it a little bit but it still won't receive all of it. As you can see I have increased the buffer on the conn.recv() but it still doesn't get all of the data. It cuts it off at a certain point. I have to press enter on my raw_input in order to receive the rest of the data. Is there anyway I can get all of the data at once? Here's the code.

port = 7777
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('0.0.0.0', port))
sock.listen(1)
print ("Listening on port: "+str(port))
while 1:
    conn, sock_addr = sock.accept()
    print "accepted connection from", sock_addr
    while 1:
        command = raw_input('shell> ')
        conn.send(command)
        data = conn.recv(8000)
        if not data: break
        print data,
    conn.close()
Gil Hamilton
  • 11,973
  • 28
  • 51
user2585107
  • 783
  • 1
  • 7
  • 5

13 Answers13

162

TCP/IP is a stream-based protocol, not a message-based protocol. There's no guarantee that every send() call by one peer results in a single recv() call by the other peer receiving the exact data sent—it might receive the data piece-meal, split across multiple recv() calls, due to packet fragmentation.

You need to define your own message-based protocol on top of TCP in order to differentiate message boundaries. Then, to read a message, you continue to call recv() until you've read an entire message or an error occurs.

One simple way of sending a message is to prefix each message with its length. Then to read a message, you first read the length, then you read that many bytes. Here's how you might do that:

def send_msg(sock, msg):
    # Prefix each message with a 4-byte length (network byte order)
    msg = struct.pack('>I', len(msg)) + msg
    sock.sendall(msg)

def recv_msg(sock):
    # Read message length and unpack it into an integer
    raw_msglen = recvall(sock, 4)
    if not raw_msglen:
        return None
    msglen = struct.unpack('>I', raw_msglen)[0]
    # Read the message data
    return recvall(sock, msglen)

def recvall(sock, n):
    # Helper function to recv n bytes or return None if EOF is hit
    data = bytearray()
    while len(data) < n:
        packet = sock.recv(n - len(data))
        if not packet:
            return None
        data.extend(packet)
    return data

Then you can use the send_msg and recv_msg functions to send and receive whole messages, and they won't have any problems with packets being split or coalesced on the network level.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 2
    I am not sure if I am understanding this completely. I understand the what's supposed to be happening but I cant't seem to be getting it. I am getting `Exception: Socket EOF trying to recv 4 bytes` I am using the following: http://pastebin.com/raw.php?i=AvdN5RyW – user2585107 Jul 16 '13 at 05:22
  • @user2585107: Try the updated version, which uses a `return None` instead of raising an exception when the stream ends. – Adam Rosenfield Jul 16 '13 at 20:42
  • shouldn't the `packet` be `.decode()`ed before adding it to `data` or `recv()` can receive both bytes and strings? – Siemkowski Sep 21 '17 at 08:49
  • Thanks bro, was going bald pulling my hair out :D – Illegal Operator Jul 01 '19 at 21:04
  • Why doesn't `recv` block until it has received the amount of data specified? – Jean Jul 03 '19 at 19:19
  • @Jean: Most of the time the application doesn't even know how much data it will be receiving in advance (e.g. consider an HTTP GET request). Applications will typically call `recv()` with a large buffer and then process however much data they received. If the server blocked the `recv()` call until it received a full buffer, it might be waiting forever if a request was smaller than a buffer. The server does have to know when it's received a full vs. partial request, which depends on the protocol being used, which is outside the scope of `recv()`. With HTTP, a request is terminated by \r\n\r\n. – Adam Rosenfield Jul 10 '19 at 23:03
  • 5
    The line `data += packet` can make receiving VERY slow for large messages. It's much better to use `data = bytearray()` and then `data.extend(packet)`. – Stan Kriventsov Sep 26 '19 at 23:04
  • @StanKriventsov: That's an excellent point, I've updated the sample with that. I would guess that the Python runtime (of course there are many different runtime implementations) would have an optimization for this case to avoid the copy and perform the extension internally when the buffer's refcount is 1, but it's certainly not guaranteed and we shouldn't rely on that. – Adam Rosenfield Oct 17 '19 at 15:04
  • I certainly haven't tried every Python runtime for this task, but at least none of the versions that I use have this optimization. – Stan Kriventsov Oct 18 '19 at 17:33
  • Thanks in advance for the answer! However, I am getting such error: " BlockingIOError: [Errno 35] Resource temporarily unavailable" in "packet = sock.recv(n - len(data))" – Mike Zhu Jul 10 '20 at 13:04
35

You can use it as: data = recvall(sock)

def recvall(sock):
    BUFF_SIZE = 4096 # 4 KiB
    data = b''
    while True:
        part = sock.recv(BUFF_SIZE)
        data += part
        if len(part) < BUFF_SIZE:
            # either 0 or end of data
            break
    return data
trilobyte
  • 123
  • 2
  • 9
JadedTuna
  • 1,783
  • 2
  • 18
  • 32
  • 7
    This works for detection of "End of File", but not for keeping a connection and detecting the end of a message. "End of File" will only be reached if the peeer closes its part of the socket, or at least half-closes it. – glglgl Jul 17 '13 at 13:46
  • 10
    If the string received is less than 4096 chars, it will loop again and re-check for more data using `sock.recv()`. This will hang since there isn't any more data is coming in. If the length of `part` is less than that of the `RECV_BUFFER`, then the code can safely break out of the loop. – SomeGuyOnAComputer Dec 03 '15 at 23:15
  • 3
    @JadedTuna, doesn't seem to be fixed. The line "part = sock.recv(BUFF_SIZE)" seems to be a blocking call, thus execution hangs at this line once the full message has been received. – sh37211 Jun 07 '17 at 21:48
  • 1
    This code should be fixed as if len(part) < BUFF_SIZE: # either 0 or end of data break – Hungry Mind Nov 22 '17 at 12:44
  • 4
    This seems to wrongly assume that one send on one end of a TCP socket corresponds to one receive of sent number of bytes on the other end (see e.g. [here](https://stackoverflow.com/a/30655169/3002584) or [here](https://stackoverflow.com/a/1806965/3002584)). Thus, even when a client sends exactly 4kb with one `send`, server might get the first, say, 1kb at the first `recv`, which would lead the `while` to break. – OfirD Apr 05 '19 at 15:21
20

The accepted answer is fine but it will be really slow with big files -string is an immutable class this means more objects are created every time you use the + sign, using list as a stack structure will be more efficient.

This should work better

while True: 
    chunk = s.recv(10000)
    if not chunk: 
        break
    fragments.append(chunk)

print "".join(fragments)
Connor
  • 4,216
  • 2
  • 29
  • 40
Mina Gabriel
  • 23,150
  • 26
  • 96
  • 124
18

Most of the answers describe some sort of recvall() method. If your bottleneck when receiving data is creating the byte array in a for loop, I benchmarked three approaches of allocating the received data in the recvall() method:

Byte string method:

arr = b''
while len(arr) < msg_len:
    arr += sock.recv(max_msg_size)

List method:

fragments = []
while True: 
    chunk = sock.recv(max_msg_size)
    if not chunk: 
        break
    fragments.append(chunk)
arr = b''.join(fragments)

Pre-allocated bytearray method:

arr = bytearray(msg_len)
pos = 0
while pos < msg_len:
    arr[pos:pos+max_msg_size] = sock.recv(max_msg_size)
    pos += max_msg_size

Results:

enter image description here

Jacob Stern
  • 3,758
  • 3
  • 32
  • 54
5

You may need to call conn.recv() multiple times to receive all the data. Calling it a single time is not guaranteed to bring in all the data that was sent, due to the fact that TCP streams don't maintain frame boundaries (i.e. they only work as a stream of raw bytes, not a structured stream of messages).

See this answer for another description of the issue.

Note that this means you need some way of knowing when you have received all of the data. If the sender will always send exactly 8000 bytes, you could count the number of bytes you have received so far and subtract that from 8000 to know how many are left to receive; if the data is variable-sized, there are various other methods that can be used, such as having the sender send a number-of-bytes header before sending the message, or if it's ASCII text that is being sent you could look for a newline or NUL character.

Community
  • 1
  • 1
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
4

Disclaimer: There are very rare cases in which you really need to do this. If possible use an existing application layer protocol or define your own eg. precede each message with a fixed length integer indicating the length of data that follows or terminate each message with a '\n' character. (Adam Rosenfield's answer does a really good job at explaining that)

With that said, there is a way to read all of the data available on a socket. However, it is a bad idea to rely on this kind of communication as it introduces the risk of loosing data. Use this solution with extreme caution and only after reading the explanation below.

def recvall(sock):
    BUFF_SIZE = 4096
    data = bytearray()
    while True:
        packet = sock.recv(BUFF_SIZE)
        if not packet:  # Important!!
            break
        data.extend(packet)
    return data

Now the if not packet: line is absolutely critical! Many answers here suggested using a condition like if len(packet) < BUFF_SIZE: which is broken and will most likely cause you to close your connection prematurely and loose data. It wrongly assumes that one send on one end of a TCP socket corresponds to one receive of sent number of bytes on the other end. It does not. There is a very good chance that sock.recv(BUFF_SIZE) will return a chunk smaller than BUFF_SIZE even if there's still data waiting to be received. There is a good explanation of the issue here and here.

By using the above solution you are still risking data loss if the other end of the connection is writing data slower than you are reading. You may just simply consume all data on your end and exit when more is on the way. There are ways around it that require the use of concurrent programming, but that's another topic of its own.

zamkot
  • 81
  • 7
2

A variation using a generator function (which I consider more pythonic):

def recvall(sock, buffer_size=4096):
    buf = sock.recv(buffer_size)
    while buf:
        yield buf
        if len(buf) < buffer_size: break
        buf = sock.recv(buffer_size)
# ...
with socket.create_connection((host, port)) as sock:
    sock.sendall(command)
    response = b''.join(recvall(sock))
Shadur
  • 345
  • 3
  • 18
yoniLavi
  • 2,624
  • 1
  • 24
  • 30
  • That one does not appear to work if the response is smaller than the buffer size. – Shadur Nov 06 '17 at 13:12
  • @Shadur, that's interesting, what happens when you try it? can you please share the code to reproduce the issue? As written, `recvall` should yield the contents of each buffer received regardless of the size as long as it's not empty. – yoniLavi Nov 06 '17 at 13:45
  • 2
    Judging by debug statements added, it inhales the entire response in the first chunk, then hangs waiting for the next chunk. The 'chunck' answer below has the same problem, I wound up fixing it with a second test to see if chunck's length was less than the buffer size. I'll test whether that fixes your solution as well. -- EDIT: It did. – Shadur Nov 06 '17 at 13:47
2

You can do it using Serialization

from socket import *
from json import dumps, loads

def recvall(conn):
    data = ""
    while True:
    try:
        data = conn.recv(1024)
        return json.loads(data)
    except ValueError:
        continue

def sendall(conn):
    conn.sendall(json.dumps(data))

NOTE: If you want to shara a file using code above you need to encode / decode it into base64

Connor
  • 4,216
  • 2
  • 29
  • 40
1

I think this question has been pretty well answered, but I just wanted to add a method using Python 3.8 and the new assignment expression (walrus operator) since it is stylistically simple.

import socket

host = "127.0.0.1"
port = 31337
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host,port))
s.listen()
con, addr = s.accept()
msg_list = []

while (walrus_msg := con.recv(3)) != b'\r\n':
    msg_list.append(walrus_msg)

print(msg_list)

In this case, 3 bytes are received from the socket and immediately assigned to walrus_msg. Once the socket receives a b'\r\n' it breaks the loop. walrus_msg are added to a msg_list and printed after the loop breaks. This script is basic but was tested and works with a telnet session.

NOTE: The parenthesis around the (walrus_msg := con.recv(3)) are needed. Without this, while walrus_msg := con.recv(3) != b'\r\n': evaluates walrus_msg to True instead of the actual data on the socket.

DJSDev
  • 812
  • 9
  • 18
0

Modifying Adam Rosenfield's code:

import sys


def send_msg(sock, msg):
    size_of_package = sys.getsizeof(msg)
    package = str(size_of_package)+":"+ msg #Create our package size,":",message
    sock.sendall(package)

def recv_msg(sock):
    try:
        header = sock.recv(2)#Magic, small number to begin with.
        while ":" not in header:
            header += sock.recv(2) #Keep looping, picking up two bytes each time

        size_of_package, separator, message_fragment = header.partition(":")
        message = sock.recv(int(size_of_package))
        full_message = message_fragment + message
        return full_message

    except OverflowError:
        return "OverflowError."
    except:
        print "Unexpected error:", sys.exc_info()[0]
        raise

I would, however, heavily encourage using the original approach.

sjMoquin
  • 29
  • 6
0

For anyone else who's looking for an answer in cases where you don't know the length of the packet prior. Here's a simple solution that reads 4096 bytes at a time and stops when less than 4096 bytes were received. However, it will not work in cases where the total length of the packet received is exactly 4096 bytes - then it will call recv() again and hang.

def recvall(sock):
    data = b''
    bufsize = 4096
    while True:
        packet = sock.recv(bufsize)
        data += packet
        if len(packet) < bufsize:
            break
    return data
anaotha
  • 552
  • 6
  • 15
0

This code reads 1024*32(=32768) bytes in 32 iterations from the buffer which is received from Server in socket programming-python:

jsonString = bytearray()

for _ in range(32):

    packet = clisocket.recv(1024)
    if not packet:
       break
    jsonString.extend(packet)

Data resides in jsonString variable

0

Plain and simple:

data = b''
while True:
    data_chunk = client_socket.recv(1024)
    if data_chunk:
         data+=data_chunk
    else:
         break
Shady
  • 216
  • 2
  • 6