1

I have a moleculer-based microservice that has an endpoint which outputs a large JSON object (around tens of thousands of objects)

This is a structured JSON object and I know beforehand what it is going to look like.

[ // ... tens of thousands of these
  {
    "fileSize": 1155624,
    "name": "Gyo v1-001.jpg",
    "path": "./userdata/expanded/Gyo v01 (2003)"
  },
  {
    "fileSize": 308145,
    "name": "Gyo v1-002.jpg",
    "path": "./userdata/expanded/Gyo v01 (2003) (Digital)"
  }
  // ... tens of thousands of these
]

I went about researching on JSON streaming, and made some headway there, in that I know how to consume a NodeJS ReadableStream client-side. I know I can use oboe to parse the JSON stream.

To that end, this is code in my Express-based app.


router.route("/getComicCovers").post(async (req: Request, res: Response) => {
  typeof req.body.extractionOptions === "object"
    ? req.body.extractionOptions
    : {};
  oboe({
    url: "http://localhost:3000/api/import/getComicCovers",
    method: "POST",
    body: {
      extractionOptions: req.body.extractionOptions,
      walkedFolders: req.body.walkedFolders,
    },
  }).on("node", ".*", (data) => {
    console.log(data);
    res.write(JSON.stringify(data));
  });
});

This is the endpoint in moleculer

getComicCovers: {
    rest: "POST /getComicCovers",
    params: {
        extractionOptions: "object",
        walkedFolders: "array",
    },
    async handler(
        ctx: Context < {
            extractionOptions: IExtractionOptions;
            walkedFolders: IFolderData[];
        } >
    ) {
        
        const comicBooksForImport = await getCovers(
            ctx.params.extractionOptions,
            ctx.params.walkedFolders
        );

// comicBooksForImport is the aforementioned array of objects.
// How do I stream it from here to the Express app object-by-object?

        
    },
},

My question is: How do I stream this gigantic JSON from the REST endpoint to the Express app so I can parse it on the client end?

UPDATE

I went with a socket.io implementation per @JuanCaicedo's suggestion. I have it setup on both the server and the client end.

However, I do have trouble with this piece of code

map(
    walkedFolders,
    async (folder, idx) => {
        let foo = await extractArchive(
            extractionOptions,
            folder
        );

        let fo =
            new JsonStreamStringify({
                foo,
            });

        fo.pipe(res);
        if (
            +idx ===
            walkedFolders.length - 1
        ) {
            res.end();
        }
    }
);

I get a Error [ERR_STREAM_WRITE_AFTER_END]: write after end error. I understand that this happens because the response is terminated before the next iteration attempts to pipe the updated value of foo (which is a stream) into the response.

How do I get around this?

frishi
  • 862
  • 1
  • 10
  • 27
  • It's not clear what you're asking. If you're going to send a single big piece JSON, there's only one way to send it (as a gigantic chunk of JSON). You can either build that JSON in memory on the server and send it all at once or you can build it dynamically in a loop and send it piece by piece with `res.write()` following that with the closing `res.end(']')` to finish the sending. It would be the client that might decide to read it as a stream rather than read it all at once. – jfriend00 May 14 '21 at 02:08

1 Answers1

1

Are you asking for a general approach recommendation, or for support with the particular solution you have?

If it's for the first, then I think your best bet for communicating between the server and the client is through websockets, perhaps with something like Socket.io. A long lived connection will serve you well here, since it will take a long time to transmit all your data across.

Then you can send data from the server to the client any time you like. At that point you can read your data on the server as a node.js stream and emit the data one at a time.

The problem with using Oboe and writing to the response on every node is that it requires a long running response, and there's a high likelihood the connection could get interrupted before you've sent all the data across.

Dharman
  • 30,962
  • 25
  • 85
  • 135
JuanCaicedo
  • 3,132
  • 2
  • 14
  • 36
  • Yes, I am asking for a general implementation. I started off with oboe.js because I wanted to narrow down the question to specifically: _How to transmit a large JSON payload from node.js to a client_? I have looked at `ndjson` to frame each object and then try to stream it, but all of those approaches assume that the payload is a file. (`fs.createReadStream` assumes a file path) My setup is creating a massive JSON payload from an endpoint, and I am looking for ways to 1. transmit it in a way the client isn't waiting 8-10 minutes for the entire payload to be available and 2. node doesn't crash – frishi May 15 '21 at 19:55
  • I went with your suggestion about `sockets` and ran into a problem with piping into the response stream. Can you review my code and see if you can help me sort it out? @JuanCaicedo – frishi May 27 '21 at 05:28
  • 1
    @frishi I doubt that you'll want to use a regular Express endpoint (and therefore `res`) at all, since that will set up a normal http request and response flow. You probably want to do something different to enable a call over websockets. Try taking a look at this tutorial to see if it can guide you on the server side implementation https://www.programwitherik.com/socket-io-tutorial-with-node-js-and-express/ – JuanCaicedo May 27 '21 at 16:28
  • Thanks! I did refactor my code to use sockets.io... and it was actually pretty easy and intuitive. Thanks for the pointers! – frishi Jun 03 '21 at 21:40
  • Glad to hear that! – JuanCaicedo Jun 03 '21 at 22:54