12

I am testing ZeroMQ as Pub-Sub (service bus style) infra for a medium system. We have about 50 nodes, all of them should be publishers and subscribers. The network is kind of a star topology, but the edges "talk" with each other. We require Dynamic discovery (no need to hard-code the network addresses of the participants) but also no SPOF (Single Point of Failure).

I have read http://zeromq.org/whitepapers:0mq-3-0-pubsub and from what I understand, the suggested 0MQ way for dynamic discovery involves an proxy node (XPUB/XSUB) which forwards subscriptions and publications. I considered using such a proxy as a central mediator in our system, however, I have the following concerns with this architecture: (A) The proxy node is a SPOF - when it fails the whole system is not functioning (B) All traffic, including data, passes through the proxy node, which means the latency & performance issue.

Assuming I understood the pub-sub whitepaper correctly, is there a relatively simple way for achieving pub-sub + dynamic-discovery + no-SPOF in ZeroMQ?

Additional point: I have ruled out multicast (PGM) solution because most messages have a single/few interested parties and we do not like to overcrowd the network.

dux2
  • 1,770
  • 1
  • 21
  • 27

1 Answers1

9

Multiple subscribers with a single publisher requires no intermediary as subscribers can talk directly to the publisher. But many publishers and subscribers at the same time is not so easy; unless there's something in the middle, maintenance will be a nightmare as new subscribers have to be configured with all existing publishers.

You could deploy several XSUB/XPUB proxies, each on their own machine, then deploy a load-balancer (like F5) between the publishers and the proxies. This achieves load-balancing and fault tolerance on the upstream side.

The proxy code is simple:

Socket frontend = context.socket(ZMQ.XSUB);
frontend.bind("tcp://proxy1:5444");
Socket backend = context.socket(ZMQ.XPUB);
backend.bind("tcp://proxy1:5555");
frontend.subscribe("".getBytes());
ZMQ.proxy (frontend, backend, null);

If a proxy node fails, just restart it; re-connections/subscriptions should be handled automatically by zmq.

For downstream subscribers, connect each subscriber directly to all available proxies:

subscriber = ctx.createSocket(ZMQ.SUB)
subscriber.connect( "tcp://proxy1:5555")
subscriber.connect( "tcp://proxy2:5555")
subscriber.connect( "tcp://proxy3:5555")

Publishers will come and go more often than proxies, so connecting subscribers directly to proxies results in less configuration maintenance since the number of proxies will, for the most part, be static.

If a proxy node fails, the upstream LTMs route traffic accordingly to the remaining proxy nodes; the subscribers won't be affected since they consume from all available proxies.

Slow subscriber may be addressed with syncing, read up on this.
Check out subscription-forwading and minimizing network traffic here.

enter image description here

raffian
  • 31,267
  • 26
  • 103
  • 174
  • Actually I do not understand something in the proposed solution: When any subscriber subscribes, the Round Robin DNS will redirect the subscription message to some LTM, which will redirect it to some (single?) proxy which will hold the subscription. If this proxy machine crashes, the subscription will get lost, isn't it? – dux2 Aug 21 '13 at 06:10
  • Thanks. In this solution, how do you avoid statically configuring the list of publishers in each proxy? How do you handle a late joining publisher? I can think of a publisher "announcing" its existence to each proxy when starting up and then the proxy sends all subscriptions to the new publisher. How does a proxy recover from a crash? it has to ask re-subscriptions from all subscribers or make subscriptions persistent. This is all possible, but a lot of code to write, almost like developing your own pub-sub from scratch. – dux2 Aug 22 '13 at 12:33
  • Updated again, hope it helps. – raffian Aug 22 '13 at 14:42
  • In the last revision, how does a proxy connect to the actual publisher or vice versa? Also, would you adviceof implementing dynamic discovery using multicast (DDS style) instead of using a proxy? Wouldn't that be simpler? – dux2 Aug 28 '13 at 14:12
  • The proxies `bind`, the pubs and subs `connect` – raffian Aug 28 '13 at 14:14
  • Could you add a sample code for the publisher? Also please note my second question in the previous comment (about multicast alternative). – dux2 Aug 28 '13 at 14:58
  • You can get code sample for any language here: https://github.com/imatix/zguide/tree/master/examples/ . I am not certain on the multicast question, sorry. – raffian Aug 28 '13 at 15:08