1

Say I have the following HTTP request:

GET /4 HTTP/1.1
Host: graph.facebook.com

And the server returns the following response:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: private, no-cache, no-store, must-revalidate
Content-Type: text/javascript; charset=UTF-8
ETag: "539feb8aee5c3d20a2ebacd02db380b27243b255"
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
X-FB-Rev: 1070755
X-FB-Debug: pC4b0ONpdhLwBn6jcabovcZf44bkfKSEguNsVKuSI1I=
Date: Wed, 08 Jan 2014 01:22:36 GMT
Connection: keep-alive
Content-Length: 172

{"id":"4","name":"Mark Zuckerberg","first_name":"Mark","last_name":"Zuckerberg","link":"http:\/\/www.facebook.com\/zuck","username":"zuck","gender":"male","locale":"en_US"}

Since the Content-Lengh header depends on the length of the content, I cannot simply split by the Content-Length: 172 string. How can I extract the JSON and headers separately? They are both important to my program. I am using this code to get the response:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("graph.facebook.com", 80))
s.send("GET /"+str(id)+"/picture HTTP/1.1\r\nHost: graph.facebook.com\r\n\r\n")
data = s.recv(1024)
s.close()
json_string = (somehow extract this)
userdata = json.loads(json_string)
735Tesla
  • 3,162
  • 4
  • 34
  • 57
  • 1
    a) Wouldn't that be `\r\n\r\n` b) I was looking to do this all in one line and a bit more gracefully But thanks for the suggestion – 735Tesla Jan 08 '14 at 01:46
  • depends on your server os, but you can use the `|` operator. a quick google search reveals [this](http://stackoverflow.com/questions/1331815/regular-expression-to-match-cross-platform-newline-characters) – tenub Jan 08 '14 at 01:47
  • 1
    I would probably use the requests library and do `somerequest.json()`.http://docs.python-requests.org/en/latest/ – erewok Jan 08 '14 at 01:48
  • @erewok is this supported in python 2.7 too? – 735Tesla Jan 08 '14 at 01:51
  • 1
    @735Tesla: `requests` is supported on Python 2.7, but it's a third-party install. And there is absolutely no need for it here; `urllib2` in the stdlib will be just as easy for your use. – abarnert Jan 08 '14 at 01:56

1 Answers1

5

The easy way to do this is to use an HTTP library. For example:

import json
import urllib2

r = urllib2.urlopen("http://graph.facebook.com/{}/picture".format(id))
json_string = r.read()
userdata = json.loads(json_string)

If you really want to parse it yourself, the HTTP protocol guarantees that headers and body are separated by an empty line, and that this will be the first empty line anywhere in the response, so it's not that hard:

data = s.recv(1024)
header, _, json_string = data.partition('\r\n\r\n')
userdata = json.loads(json_string)

There are some obvious down sides to this—as written, your code won't work if the response is longer than 1K, or if the kernel doesn't give you the whole response in a single recv (which it's never guaranteed to do), or if the server redirects you or gives you a 100 CONTINUE before the real response, or if the server decides to send back a chunked or MIME-multipart or other response instead of a flat body, or…

Community
  • 1
  • 1
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • What is the purpose of the `, _,` in `header, _,`? – 735Tesla Jan 08 '14 at 01:56
  • 1
    @735Tesla: [`str.partition`](http://docs.python.org/2.7/library/stdtypes.html#str.partition) returns three values: the part before the separator, the separator, and the part after the separator. Often you don't need the middle one (you know it's just going to be `'\r\n\r\n'` here…). Assigning don't-care values to `_` is a common idiom in Python—just readable enough that you can tell there's a value there, but unobtrusive enough to signal that the value doesn't matter beyond noting its existence. – abarnert Jan 08 '14 at 01:58
  • Thanks I never heard of using `_` that way before. +1 – 735Tesla Jan 08 '14 at 02:01
  • This is much better than my answer. +1 – aIKid Jan 08 '14 at 02:02