2

I've been scratching my head for days to solve this problem. I want to change value of some key from a relatively big JSON string streamed from the HTTP request, and then stream it to the client. Pretend this is a big JSON:

{
 "name":"George", 
 "country": {
      "home": "United States",
      "current": "Canada"
 }
}

And I want output like this by changing name.country.current

{
 "name":"George", 
 "country": {
      "home": "United States",
      "current": "Indonesia"
 }
}

The transformation is done within a restify handler:

let proxyHandler = function(req, res, next) {
  let proxyReq = http.request(opt, r => {
    r.on('data', data => {
      // transform here and send the data using response.write();
      // and close the response object
      // when the parsing ends
  });
  proxyReq.end();
  next();
}

I cannot use JSON.parse because the size of the JSON is big, so I'd need to stream/parse/transform it as it arrives. Is there any library out there that able to do so?

I've tried using stream-json, however it's very slow when I need to combine the Transform stream. When I initiate a huge number of requests it just crawls and then timed out.

Because the client is not sent a Content-Length header, the server need to close the stream.

UPDATE:

I understand that there's a streaming JSON parser. However what I need is not only a parser, but also emitter. The process would be

JSON -> Parse (event based) -> Transform parse event -> Emit the transformed JSON. All need to be done in NodeJS stream.

As I've mentioned above, I've used stream-json, I'v written my stack-based emitter but it was slow and created backpressure when a lot of requests come in. What I ask if there's any node library out there that able to process in one go. Ideally, the library can be executed like below:

// JSONTransform is a hypotetical library class
result
  .pipe(new JSONTransform('name.country.home', (val) => 'Indonesia')
  .pipe(response)
Lynx Luna
  • 71
  • 2
  • 7
  • Possible duplicate of [Parse large JSON file in Nodejs](http://stackoverflow.com/questions/11874096/parse-large-json-file-in-nodejs) – tu4n Oct 28 '16 at 20:33
  • I'm aware of [JSONStream](https://github.com/dominictarr/JSONStream) or [Clarinet](https://github.com/dscape/clarinet/blob/master/clarinet.js) which indeed a streaming parser. What I meant was not **how to parse**, but **how to transform value** with input as a json and outputs as a transformed json with value changed using NodeJS stream. – Lynx Luna Oct 28 '16 at 21:25
  • In that SO thread josh detailed how to process JSON stream line by line, using similar tactic and the knowledge of your file structure, you need to locate the part where you need to change the value – tu4n Oct 28 '16 at 22:13
  • Can the property you want to change be some property of an array-item? Please specify, I might come up with something – tu4n Oct 28 '16 at 22:15

1 Answers1

0

Suppose you want to change the property at foo.bar.jar
The pseudo step could be as follow:


  1. Buffer your data till you find an "{" tag (Ex: Buffer='foo: {' )
  2. Get the property name from Buffer, stream the Buffer down response and clear it
  3. Does the property name match 'foo'

    • If not, continue streaming down response til you find the closing "}" tag (skip all property in the between). Repeat step-1 to step-3
    • If yes do step-4
  4. Repeat step-1 to step-3, only this time check for property name matching 'bar'
tu4n
  • 4,200
  • 6
  • 36
  • 49
  • Using this solution If the key is towards the end of big stream, say 5MB, the client will close the connection. This is what happens at the moment with my slow transformer. The transformer should be fast enough not to stop response to client, as the client will timeout in around 5 seconds. – Lynx Luna Oct 29 '16 at 07:12
  • Yeah this is kind of a trade-off for low memory usage (processing speed will suffer). It's why you're using stream instead of parsing the whole file is it not? – tu4n Oct 29 '16 at 11:54
  • You can either increase the timeout, or upgrade your memory so it can handle big files – tu4n Oct 29 '16 at 11:57
  • The strategy above should be quite fast if implemented correctly as it only run O(n) – tu4n Oct 29 '16 at 11:59