0

Is it significantly slower to publish
when there is one subscriber v/s no subscribers at all?

More Detail:

We're writing a ZeroMQ application, where speed is very important.

We have many nodes that communicate via REQ/REP as well as PUB/SUB and the network automatically selects { ipc: | tcp: } transport-class, if the nodes are on the same machine.

We'd like to sometimes log the messages between certain nodes. With PUB/SUB this is easy, we just have a "logging node" subscribed to the publisher. However, with REQ/REP, we cannot read the request/response without becoming a proxy or otherwise slowing down the connection.

We're considering having all of the nodes using REQ/REP publish to a unique TCP address every time they send a message ( so each node has a "logging address" that they send all their messages to ), we'll then just subscribe to the "logging addresses" we're interested in, if we want to log.

Question:

Will we suffer a performance penalty if we ARE NOT subscribed to the "logging address"? A slowdown during logging is okay, but performance penalties during normal operation are not desirable.

user3666197
  • 1
  • 6
  • 50
  • 92
seveibar
  • 4,403
  • 4
  • 30
  • 31

1 Answers1

2

How the subscription works?

Until v3.1, subscription mechanics ( a.k.a. a TOPIC-filter ) was handled on the SUB-side, so this part of the processign got distributed among all SUB-s ( at a cost of uniformly wide data-traffic across all transport-classes involved ) and there was no penalty, except for a sourcing such data-flow related workload ( ref. below ) on the PUB-side.

Since v3.1, the TOPIC-filter is processed on the PUB-side, at a cost of such a processing overhead, but saving all the previously wasted transport-capacities, consumed just to later realise at the SUB-side the message is not matching the TOPIC-filter and will be disposed off.


Quantitative metric for what "significantly slower" indeed means in-vivo

As postulated in Question, the comparison ought be related to:
Scenario A: a PUB-process has no SUB-consumer connected/subscribed to any TOPIC-filter
Scenario B: a PUB-process has one SUB-consumer connected/subscribed to a TOPIC-filter

ZeroMQ has a state-full internal FSA, which saves both the programming architecture and resouces utilisation. This said, the Scenario A produces zero workload, i.e. has no impact related to PUB-processing, as there none such processing actually happens until a first real SUB-connects.

If your Scenario B does indeed represent the use-case, the additional processing overhead, related to serving just one single SUB-consumer, is easily measureable:

from zmq import Stopwatch as StopWATCH
aStopWATCH = StopWATCH()
# -----------------------------------------------------------------<TEST_SECTION>-start
aStopWATCH.start();s_PUB_send.send( "This is a MESSAGE measured for 0 SUB-s", zmq.NOBLOCK );t0 = aStopWATCH.stop()
# -----------------------------------------------------------------<TEST_SECTION>-end


# .connect the first SUB-process and let it .setsockopt() for v3.1+ accordingly


# -----------------------------------------------------------------<TEST_SECTION>-start
aStopWATCH.start();s_PUB_send.send( "This is a MESSAGE measured for 1 SUB-s", zmq.NOBLOCK );t1 = aStopWATCH.stop()
# -----------------------------------------------------------------<TEST_SECTION>-end

print "\nZeroMQ has consumed {0:} [us] for PUB-side processing on [Scenario A]\nZeroMQ has consumed {1:} [us] for PUB-side processing on [Scenario B]".format( t0, t1 )

The same test might be re-used to measure such difference in case .connect()-ed ( FSA-knows about the live counterparty ), but subscribed to nothing ( .setsockopt( "" ) ) SUB-consumer processing is to be validated, irrespective of the actually used { pre-v3.1 | v3.1+ }-API ( just be carefull to handle different versions of API in distributed-systems, where one cannot enforce uniform API-versions for remote nodes, that are outside of one's control of the Configuration Management ).


And if performance is already bleeding?

One may further fine tune performance attributes for performance already constrained projects.

For selected processing tasks, performance, that one may guess a-priori is not so tough here, one may segregate workload-streams' processing by mapping each one on disjunct sub-sets of the multiple created I/O-threads:

map s_REQ_sock.setsockopt( ZMQ_AFFINITY, 0 );
and s_PUB_send.setsockopt( ZMQ_AFFINITY, 1 );
resp. s_SUB_recv.setsockopt( ZMQ_AFFINITY, ... );

set s_SUB_recv.setsockopt( ZMQ_MAXMSGSIZE, 32000 ); // protective ceiling
set s_SUB_recv.setsockopt( ZMQ_CONFLATE,   True );  // retain just the last msg
set s_SUB_recv.setsockopt( ZMQ_LINGER,     0 );     // avoid blocking
set s_SUB_recv.setsockopt( ZMQ_TOS,        anAppToS_NETWORK_PRIO_CODE );
user3666197
  • 1
  • 6
  • 50
  • 92
  • So it sounds like I won't incur a performance penalty for always sending logging messages to a logging address because zeromq knows that there is no subscriber attached? – seveibar Jul 28 '16 at 20:25
  • 1
    Roger that, Sir. Sending side **F**inite **S**tate **M**achine is saving our worries in this manner. No consumer **`.connect()`-ed** means no task to perform at all on the **`PUB`** side. Isn't that a lovely use of just-enough system analysis? With a **reverse `.bind() / .connect` one may even control this behaviour right from the `PUB`-side processing** - isn't that great for a production switch for an in-vivo self-diagnostics? – user3666197 Jul 28 '16 at 22:51
  • Perfect! Makes sense to me! – seveibar Jul 29 '16 at 03:18