4

I've already found Event loop for large files?, but it's mostly about downloads. The conclusion I take from that post is node.js might have been adequate for downloads, but Nginx is a battle-hardened solution that "ain't broke."

But what about uploads? We have enormous files being uploaded. We do genomics, and human genome datasets are as much as 200GB in size. As far as I've been able to determine, Nginx always buffers the complete request, header and body, before forwarding it to a back-end. We've run out of memory handling three uploads at the same time.

We have a swarm of small, "does one thing and does it well" servers running in our application, one of which handles the uploads (and type transformations to an in-house format) of the genomic data, and another of which provides socket.io handling to keep customers appraised of both upload progress and other events going on in our application's ecology. Others handle authentication, customer data processing, and plain 'ol media service.

If I'm reading the code for node's http/https modules right, node.js would be an ideal tool for handling these issues: it speaks HTTP/1.1 natively, so the websockets passthrough would work, and it hands the (request, response) tuple to the handler function after processing the HTTP HEAD but holding off on the BODY until the handler function binds request.on('data', ...) events to drain the BODY buffer.

We have a well-segmented, url-based namespace for our services: "/import," "/events", "/users", "/api", "/media", etc. Nginx only handles the last three correctly. Would it be difficult or inappropriate to replace Nginx with a node.js application to handle all of them? Or is there some obscure reverse proxy (Nginx, Pound, and Varnish all have similar limitations) that already does everything I want?

Community
  • 1
  • 1
Elf Sternberg
  • 16,129
  • 6
  • 60
  • 68

2 Answers2

3

As the other answer states, formidable is a very solid library for handling uploads. By default it buffers to disk, but you can override that behavior and handle the data as it comes if, if you need. So if you wanted to write your own proxy, node.js + formidable would be a great way to get uploads to stream as they come in.

You could also try node-http-proxy, but I'm not sure on how it buffers, unfortunately. You should also consider that it hasn't been used anywhere near as much as Nginx, so I'm not sure how much I'd trust it exposed directly to the wild (not so much an issue with the library, per-se, but more with Node).

Have you taken a look at Nginx's client_body_buffer_size directive? It seems like setting it to a lower value would solve your memory issues.

ShZ
  • 6,518
  • 1
  • 26
  • 22
  • 1
    `client_body_buffer_size` dictates what happens when the client request exceeds a given size, but it's not a streaming solution. – Elf Sternberg Nov 24 '11 at 18:52
  • Ultimately, I went with `node-http-proxy`, and just [wrote my own switchboard](https://github.com/elfsternberg/node-http-proxy-switchboard) – Elf Sternberg Nov 24 '11 at 18:53
1

I'm not sure what are you asking(there wasn't any tl;dr!) but you can have a look at these modules: formaline and formidable that both are battle hardened, mature and fast which write files in tmp folder and wouldn't result in running out of memory easily. And in memory management, v8 garbage collector is the best.

As for http proxy, again there is a module, node-http-proxy, which is again battle hardened, and used in development. I believe they made it, to make sure people don't have any reason to use Nginx as a reverse proxy.

As of scaling your applications, and using multiple processes on multiple machines, I suggest, using hook.io. Using hook.io you can make processes for every part of your application and each can talk to each other by emitting events and others listening to it. It is not completely stable yet, but good enough for starting development with it.

Farid Nouri Neshat
  • 29,438
  • 6
  • 74
  • 115