What could cause the response body to be cut off (on client-side)?

Question

I am writing a python language plugin for an active code generator that makes calls to our Rest API. After making many attempts to use the requests library and failing, I opted to use the much lower level socket and ssl modules, which have been working fine so far. I am using a very crude method to parse the responses; for fairly short responses in the body, this works fine, but I am now trying to retrieve much larger json objects (lists of users). The response is being cut off as follows (note: I removed a couple user entries for the sake of brevity):
{"page-start":1,"total":5,"userlist":[{"userid":"jim.morrison","first-name":"Jim","last-name":"Morrison","language":"English","timezone":"(GMT+5:30)CHENNAI,KOLKATA,MUMBAI,NEW DELHI","currency":"US DOLLAR","roles":
There should be a few more users after this and the response body is on a single line in the console.

Here is the code I am using to request the user list from the Rest API server:

import socket, ssl, json

host = self.WrmlClientSession.api_host
port = 8443
pem_file = "<pem file>"

url = self.WrmlClientSession.buildURI(host, port, '<root path>')

#Create the header
http_header = 'GET {0} HTTP/1.1\n\n'
req = http_header.format(url)

#Socket configuration and connection execution
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
conn = ssl.wrap_socket(sock, ca_certs = pem_file)
conn.connect((host, port))
conn.send(req)

response = conn.recv()
(headers, body) = response.split("\r\n\r\n")

#Here I would convert the body into a json object, but because the response is 
#cut off, it cannot be properly decoded.  
print(response)

Any insight into this matter would be greatly appreciated!

Edit: I forgot to mention that I debugged the response on the server-side, and everything was perfectly normal.

score 3 · Accepted Answer · answered Mar 09 '13 at 10:13

You can't assume that you can just call recv() once and get all the data since the TCP connection will only buffer a limited amount. Also, you're not parsing any of the headers to determine the size of body that you're expecting. You could use a non-blocking socket and keep reading until it blocks, which will mostly work but is simply not reliable and quite poor practice so I'm not going to document it here.

HTTP has ways of indicating the size of the body for exactly this reason and the correct approach is to use them if you want your code to be reliable. There are two things to look for. Firstly, if the HTTP response has a Content-Length then that indicates how many bytes will occur in the response body - you need to keep reading until you've had that much. The second option is that the server may send you a response which uses chunked encoding - it indicates this by including a Transfer-Encoding header whose value will contain the text chunked. I won't go into chunked encoding here, read the wikipedia article for details. In essence the body contains small headers for each "chunk" of data which indicate the size of that chunk. In this case you have to keep reading chunks until you get an empty one, which indicates the end of the response. This approach is used instead of Content-Length when the size of the response body isn't known by the server when it starts to send it.

Typically a server won't use both Content-Length and chunked encoding, but there's nothing to actually stop it so that's also something to consider. If you only need to interoperate with a specific server then you can just tell what it does and work with that, but be aware you'll be making your code less portable and more fragile to future changes.

Note that when using these headers, you'll still need to read in a loop because any given read operation may return incomplete data - TCP is designed to stop sending data until the reading application has started to empty the buffer, so this isn't something you can work around. Also note that each read may not even contain a complete chunk, so you need to keep track of state about the size of the current chunk and the amount of it you've already seen. You only know to read the next chunk header when you've seen the number of bytes specified by the previous chunk header.

Of course, you don't have to worry about any of this if you use any of Python's myriad of HTTP libraries. Speaking as someone who's had to implement a fairly complete HTTP/1.1 client before, you really want to let someone else do it if you possibly can - there's quite a few tricky corner cases to consider and your simple code above is going to fail in a lot of cases. If requests doesn't work for you, have you tried any of the standard Python libraries? There's urllib and urllib2 for higher level interfaces and httplib provides a lower-level approach which you might find allows you to work around some of your problems.

Remember that you can always modify the code in these (after copying to your local repository of course) if you really have to fix issues, or possibly just import them and monkey-patch your changes in. You'd have to be quite clear it was an issue in the library and not just a mistaken use of it, though.

If you really want to implement a HTTP client that's fine, but just be aware that it's harder than it looks.

As a final aside, I've always used the read() method of SSL sockets instead of recv() - I'd hope they'd be equivalent, but you may wish to try that if you're still having issues.

What could cause the response body to be cut off (on client-side)?

1 Answers1