17

In a php program, I want to parse JSON incrementally. For example, given the partial JSON

[1, 2, {"id": 3},

I want to get 1, 2 and the dictionary even before the rest of the JSON input is streamed. php's json_decode just returns NULL and there doesn't seem to be a way to get the position of the error.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • 1
    [Is there a streaming API for JSON?](http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json) is a similar question, but its content is far different from what one might expect by the title, and it's language-independent, so most answers relate to Java. – phihag Sep 05 '11 at 18:05
  • 3
    +1. Wow, good question, isn't it possible to just wait it out until the rest of the json is streamed? – Madara's Ghost Sep 05 '11 at 18:05
  • 1
    @Rikudo Sennin Not if the stream is very slow (because it's generated by a slow process) or even infinite. – phihag Sep 05 '11 at 18:05
  • Are there limitations to this? For example, could it return a piece that's in the middle of a number `[1, 2, {"id": 3}, ..., 4` and `5, 46, 47, ...` where `45` should be one number? – animuson Sep 05 '11 at 18:09
  • @genesis φ Because I want to deal with a remote point that serves such a stream (and can take hours or forever to do so), and I want to show the first elements in the array while it's being streamed. If you have a better suggestion for a simple streaming protocol, I'm interested as well. – phihag Sep 05 '11 at 18:09
  • @phihag: can you tell me how is that "streaming" done? Do you mean while downloading? or? – genesis Sep 05 '11 at 18:11
  • @animuson Nope, the individual elements can be arbitrary. In practice, they're all objects/dictionaries. I wouldn't mind getting called or being able to access sub-elements as well, so that I could get 3 after `[1, 2, {"id": 3,` has been parsed. – phihag Sep 05 '11 at 18:11
  • @genesis φ It's actually pretty generic - the data is coming from pipes, a TCP or Unix socket, or over HTTP. *Downloading* pretty much captures it. – phihag Sep 05 '11 at 18:13
  • It's not difficult to adapt a custom json parser. Incomplete tokenizing shouldn't be a problem. (If it's really supposed to be a stream parser, then it would require a separate state object to recreate the parser recursion, keep references in the incomplete target array, etc.) – mario Sep 05 '11 at 18:13
  • @mario Yup, but it's even less difficult to use a full solution. I plan on writing one and thereby answer this question if there isn't one already out there. – phihag Sep 05 '11 at 18:15
  • 2
    Here's an idea: Why not just use something like zeromq and have the server push packets to the queue whenever it finishes processing some. Your code listening the queue can then easily display them as soon as they come in. – Jani Hartikainen Sep 05 '11 at 19:32
  • @Jani Hartikainen Great idea. Unfortunately, that would require php-zeromq bindings, wouldn't it? Since practically no shared hoster has these, php's main advantage of running everywhere would be lost. – phihag Sep 05 '11 at 19:47
  • Interesting problem.. how are you receiving/reading the stream? – jlb Oct 27 '11 at 10:10
  • @jlb I have a file handle where another process writes too, or equivalently, a handle created by `proc_open`. – phihag Oct 27 '11 at 10:22

3 Answers3

6

Update

I've written a small class that does char-by-char JSON input parsing.. https://github.com/janeklb/JSONCharInputReader

Fresh off the presses so it's probably got a few bugs.. if you decide to try it out, let me know!

--

Could you (while keeping track of '{', '[', ']', '}' scope) break the stream up on each comma that's not part of a string value? And then process each token using json_decode()?

This solution would work best if the json stream didn't have a lot of large objects (as they would only be parsed once they've arrived in full).

Edit: if it does have large objects, this strategy could be modified to look a little 'deeper' instead.. but the complexity here would shoot up.

jlb
  • 19,090
  • 8
  • 34
  • 65
  • Sure, that's precisely the solution I'd expect the library I'm searching for to implement. – phihag Oct 27 '11 at 10:36
  • @jlb It would indeed be rather easy. JSON is designed to be easy to parse. You would need to keep track of scope and unescaped double quotes, as far as i can see. – jwueller Oct 27 '11 at 10:43
  • @philhag Could you link me to the code that reads from the stream? (not the stream itself -- ill try to replicate that) – jlb Oct 27 '11 at 14:03
  • @philhag take a look at the github repo I added above – jlb Nov 01 '11 at 17:17
  • I've tried to use @Rob parser below to stream a remote API, but if the originale JSON object is complex, it doesn't work very well. This is much easier to use. – Nicola Peluchetti Feb 26 '15 at 15:28
3

I've written a SAX-like JSON streaming parser that should do the trick. Let me know if it works!

Rob Gonzalez
  • 579
  • 1
  • 6
  • 19
1

There's a simple work-around, if each individual element is guaranteed to be received in it's entirety, or in other words - you can't get e.g. just the half of an object like this:

{"a": 1,

json_decode() will return NULL because the string you're passing to it is not a valid JSON string. Replace the trailing comma with an ending bracket and there you go:

[1, 2, {"id": 3}]

There's no problem in decoding it now and wait for other parts of the stream to be received later.

Narf
  • 14,600
  • 3
  • 37
  • 66
  • Unfortunately, I can get half an object. But even if that wasn't a problem; this solution is Θ(n²), albeit really simple. – phihag Sep 05 '11 at 19:49