13

I have a socket opened and I'd like to read some json data from it. The problem is that the json module from standard library can only parse from strings (load only reads the whole file and calls loads inside) It even looks that all the way inside the module it all depends on the parameter being string.

This is a real problem with sockets since you can never read it all to string and you don't know how many bytes to read before you actually parse it.

So my questions are: Is there a (simple and elegant) workaround? Is there another json library that can parse data incrementally? Is it worth writing it myself?

Edit: It is XBMC jsonrpc api. There are no message envelopes, and I have no control over the format. Each message may be on a single line or on several lines. I could write some simple parser that needs only getc function in some form and feed it using s.recv(1), but this doesn't as a very pythonic solution and I'm a little lazy to do that :-)

cube
  • 3,867
  • 7
  • 32
  • 52
  • 5
    Does this socket stream include envelopes? Most socket protocols give you some idea of the size of the content coming down the stream. Are you trying to connect to a well known json socket protocol? Do you have control of the socket protocol? The simplest way is to know the size of each message (like HTTP has Content-Length headers). Otherwise you have to parse the data as it comes in to know when it starts and when it ends and the standard library can't help you. – six8 Sep 07 '11 at 16:48

7 Answers7

7

Edit: given that you aren't defining the protocol, this isn't useful, but it might be useful in other contexts.


Assuming it's a stream (TCP) socket, you need to implement your own message framing mechanism (or use an existing higher level protocol that does so). One straightforward way is to define each message as a 32-bit integer length field, followed by that many bytes of data.

Sender: take the length of the JSON packet, pack it into 4 bytes with the struct module, send it on the socket, then send the JSON packet.

Receiver: Repeatedly read from the socket until you have at least 4 bytes of data, use struct.unpack to unpack the length. Read from the socket until you have at least that much data and that's your JSON packet; anything left over is the length for the next message.

If at some point you're going to want to send messages that consist of something other than JSON over the same socket, you may want to send a message type code between the length and the data payload; congratulations, you've invented yet another protocol.

Another, slightly more standard, method is DJB's Netstrings protocol; it's very similar to the system proposed above, but with text-encoded lengths instead of binary; it's directly supported by frameworks such as Twisted.

Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
5

If you're getting the JSON from an HTTP stream, use the Content-Length header to get the length of the JSON data. For example:

import httplib
import json

h = httplib.HTTPConnection('graph.facebook.com')
h.request('GET', '/19292868552')
response = h.getresponse()
content_length = int(response.getheader('Content-Length','0'))

# Read data until we've read Content-Length bytes or the socket is closed
data = ''
while len(data) < content_length or content_length == 0:
    s = response.read(content_length - len(data))
    if not s:
        break
    data += s

# We now have the full data -- decode it
j = json.loads(data)
print j
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
3

What you want(ed) is ijson, an incremental json parser. It is available here: https://pypi.python.org/pypi/ijson/ . The usage should be simple as (copying from that page):

import ijson.backends.python as ijson

for item in ijson.items(file_obj):
    # ...

(for those who prefer something self-contained - in the sense that it relies only on the standard library: I wrote yesterday a small wrapper around json - but just because I didn't know about ijson. It is probably much less efficient.)

EDIT: since I found out that in fact (a cythonized version of) my approach was much more efficient than ijson, I have packaged it as an independent library - see here also for some rough benchmarks: http://pietrobattiston.it/jsaone

Pietro Battiston
  • 7,930
  • 3
  • 42
  • 45
2

Do you have control over the json? Try writing each object as a single line. Then do a readline call on the socket as described here.

infile = sock.makefile()

while True:
    line = infile.readline()
    if not line: break
    # ...
    result = json.loads(line)
Community
  • 1
  • 1
Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
0
res = str(s.recv(4096), 'utf-8') # Getting a response as string
res_lines = res.splitlines() # Split the string to an array
last_line = res_lines[-1] # Normally, the last one is the json data
pair = json.loads(last_line)

https://github.com/A1vinSmith/arbitrary-python/blob/master/sockets/loopHost.py

Alvin Smith
  • 547
  • 5
  • 9
  • This will not work because: 1) The json object can be longer than 4kB 2) The object can be spread on more than one line – cube Oct 04 '20 at 19:58
  • yeah, this is more like a sample. not a verbose solution. can either use other library or add regex here. – Alvin Smith Oct 05 '20 at 05:31
  • and if not limited on socket. `requests` would make thing much easier by using `response.json()`. https://github.com/A1vinSmith/arbitrary-python/blob/master/requests/loopHost.py – Alvin Smith Oct 05 '20 at 05:32
0

Skimming the XBMC JSON RPC docs, I think you want an existing JSON-RPC library - you could take a look at: http://www.freenet.org.nz/dojo/pyjson/

If that's not suitable for whatever reason, it looks to me like each request and response is contained in a JSON object (rather than a loose JSON primitive that might be a string, array, or number), so the envelope you're looking for is the '{ ... }' that defines a JSON object.

I would, therefore, try something like (pseudocode):

while not dead:
    read from the socket and append it to a string buffer
    set a depth counter to zero
    walk each character in the string buffer:
        if you encounter a '{':
            increment depth
        if you encounter a '}':
            decrement depth
            if depth is zero:
                remove what you have read so far from the buffer
                pass that to json.loads()
Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
  • 2
    This could work, but I'd still have to parse strings (at least opening and closing quote and escaped quote) because of strings like this: `"\"{"`. – cube Sep 15 '11 at 12:20
  • I've tried a few JSONRPC libraries. Most of them are too complicated, can't run on a raw TCP connection, or can only run as a server. The only JSON parser I've tried was the one from standard library. – cube Sep 15 '11 at 20:16
  • I guess I'm still not clear on the parameters of your problem. Is your app in the client or server role? Are you expecting to use the raw TCP connection for more than just JSON RPC? – Russell Borogove Sep 15 '11 at 20:35
  • My app is a client. Server responds either responds to my method calls or sends notifications that go without response. All messages sent on the sockets are JSON objects or arrays. – cube Sep 15 '11 at 22:22
0

You may find JSON-RPC useful for this situation. It is a remote procedure call protocol that should allow you to call the methods exposed by the XBMC JSON-RPC. You can find the specification on Trac.

dman
  • 39
  • 7