37

Web Dynos can handle HTTP Requests

and while Web Dynos handles them Worker Dynos can handle jobs from it.

But I don't know how to make Web Dynos and Worker Dynos to communicate each other.

For example, I want to receive a HTTP request by Web Dynos

, send it to Worker Dynos

, process the job and send back result to Web Dynos

, show results on Web.

Is this possible in Node.js? (With RabbitMQ or Kue or etc)?

I could not find an example in Heroku Documentation

Or Should I implement all codes in Web Dynos and scaling Web Dynos only?

jwchang
  • 10,584
  • 15
  • 58
  • 89

2 Answers2

41

As the high-level article on background jobs and queuing suggests, your web dynos will need to communicate with your worker dynos via an intermediate mechanism (often a queue).

To accomplish what it sounds like you're hoping to do follow this general approach:

  • Web request is received by the web dyno
  • Web dyno adds a job to the queue
  • Worker dyno receives job off the queue
  • Worker dyno executes job, writing incremental progress to a shared component
  • Browser-side polling requests status of job from the web dyno
    • Web dyno queries shared component for progress of background job and sends state back to browser
  • Worker dyno completes execution of the job and marks it as complete in shared component
  • Browser-side polling requests status of job from the web dyno
    • Web dyno queries shared component for progress of background job and sends completed state back to browser

As far as actual implementation goes I'm not too familiar with the best libraries in Node.js, but the components that glue this process together are available on Heroku as add-ons.

Queue: AMQP is a well-supported queue protocol and the CloudAMQP add-on can serve as the message queue between your web and worker dynos.

Shared state: You can use one of the Postgres add-ons to share the state of an job being processed or something more performant such as Memcache or Redis.

So, to summarize, you must use an intermediate add-on component to communicate between dynos on Heroku. While this approach involves a little more engineering, the result is a properly-decoupled and scalable architecture.

Ryan Daigle
  • 11,599
  • 2
  • 36
  • 22
  • I have one more question on this. When I use AMQP, how would you guarantee that processing each jobs are handled by one worker dyno but not duplicating? To me, AMQP is similar to TCP Socket, broadcast event and listen to the event and do something. If an "enqueue" event happened, multiple worker dynos would react to the "enqueue" event and try to "dequeue" event in same time. How can I handle this problem? – jwchang Jul 11 '12 at 18:22
  • 1
    While queue behavior varies between each queue and client libraries the default behavior is usually _not_ to broadcast. So by default, when a message is consumed off the queue it is done so by the first receiver to get there and is then removed from the queue. – Ryan Daigle Jul 11 '12 at 20:35
  • 1
    In AMQP you have Exchanges to which you publish messages, and you have Queues from which you get messages, then you have "bindings" between them which routes the messages from an Exchange to one or more Queues. If you only have one binding between an Exchange and a Queue (which is the default), you're guaranteed to only get unique messages to each subscriber of that Queue. – Carl Hörberg Jul 12 '12 at 07:16
  • Also, AMQP has other nice benefits like in-order guarantees, and features like message persistance, high availability (mirrored) queues etc. (Disclosure, I own CloudAMQP) – Carl Hörberg Jul 12 '12 at 07:19
-4

From what I can tell, Heroku does not supply a way of communicating for you, so you will have to build that yourself. In order to communicate to another process using Node, you will probably have to deal with the process' stdin/out/err manually, something like this:

var attachToProcess = function(pid) {
    return {
        stdin: fs.createWriteStream('/proc/' + pid + '/fd/0'),
        stdout: fs.createReadStream('/proc/' + pid + '/fd/1'),
        stderr: fs.createReadStream('/proc/' + pid + '/fd/2')
    };
};

var pid = fs.readFile('/path/to/worker.pid', 'utf8', function(err, pid) {
    if (err) {throw err;}
    var worker = attachToProcess(Number(pid));
    worker.stdin.write(...);
});

Then, in your worker process, you will have to store the pid in that pid file:

fs.writeFile('/path/to/worker.pid', process.pid, function(err) {
    if (err) {throw err;}
});

I haven't actually tested any of this, so it will likely take some working and building on it, but I think the basic idea is clear.

Edit

I just noticed that you tagged this with "redis" as well, and thought I should add that you can also use redis pub/sub to communicate between your various processes as explained in the node_redis readme.

kbjr
  • 1,254
  • 2
  • 10
  • 22
  • 8
    [Heroku dynos](https://devcenter.heroku.com/articles/dynos) are each virtualized meaning they do not share the same filesystem, even within the same app. So, communicating via process ids from one dyno to another won't work. – Ryan Daigle Jul 11 '12 at 17:26
  • @RyanDaigle yeah, i thought there might be some issue there. the idea about redis is still valid, though. – kbjr Jul 13 '12 at 08:01
  • definitely. Using Redis as an intermediary (or some other queue lib) is the right approach. – Ryan Daigle Jul 13 '12 at 13:53