How the subscription works?
Until v3.1
, subscription mechanics ( a.k.a. a TOPIC-filter ) was handled on the SUB
-side, so this part of the processign got distributed among all SUB
-s ( at a cost of uniformly wide data-traffic across all transport-classes involved ) and there was no penalty, except for a sourcing such data-flow related workload ( ref. below ) on the PUB
-side.
Since v3.1
, the TOPIC-filter is processed on the PUB
-side, at a cost of such a processing overhead, but saving all the previously wasted transport-capacities, consumed just to later realise at the SUB
-side the message is not matching the TOPIC-filter and will be disposed off.
Quantitative metric for what "significantly slower" indeed means in-vivo
As postulated in Question, the comparison ought be related to:
Scenario A:
a PUB
-process has no SUB
-consumer connected/subscribed to any TOPIC-filter
Scenario B:
a PUB
-process has one SUB
-consumer connected/subscribed to a TOPIC-filter
ZeroMQ has a state-full internal FSA, which saves both the programming architecture and resouces utilisation. This said, the Scenario A
produces zero workload, i.e. has no impact related to PUB
-processing, as there none such processing actually happens until a first real SUB
-connects.
If your Scenario B
does indeed represent the use-case, the additional processing overhead, related to serving just one single SUB
-consumer, is easily measureable:
from zmq import Stopwatch as StopWATCH
aStopWATCH = StopWATCH()
# -----------------------------------------------------------------<TEST_SECTION>-start
aStopWATCH.start();s_PUB_send.send( "This is a MESSAGE measured for 0 SUB-s", zmq.NOBLOCK );t0 = aStopWATCH.stop()
# -----------------------------------------------------------------<TEST_SECTION>-end
# .connect the first SUB-process and let it .setsockopt() for v3.1+ accordingly
# -----------------------------------------------------------------<TEST_SECTION>-start
aStopWATCH.start();s_PUB_send.send( "This is a MESSAGE measured for 1 SUB-s", zmq.NOBLOCK );t1 = aStopWATCH.stop()
# -----------------------------------------------------------------<TEST_SECTION>-end
print "\nZeroMQ has consumed {0:} [us] for PUB-side processing on [Scenario A]\nZeroMQ has consumed {1:} [us] for PUB-side processing on [Scenario B]".format( t0, t1 )
The same test might be re-used to measure such difference in case .connect()
-ed ( FSA-knows about the live counterparty ), but subscribed to nothing ( .setsockopt( "" )
) SUB
-consumer processing is to be validated, irrespective of the actually used { pre-v3.1 | v3.1+ }-API
( just be carefull to handle different versions of API in distributed-systems, where one cannot enforce uniform API-versions for remote nodes, that are outside of one's control of the Configuration Management ).
And if performance is already bleeding?
One may further fine tune performance attributes for performance already constrained projects.
For selected processing tasks, performance, that one may guess a-priori is not so tough here, one may segregate workload-streams' processing by mapping each one on disjunct sub-sets of the multiple created I/O-threads:
map s_REQ_sock.setsockopt( ZMQ_AFFINITY, 0 );
and s_PUB_send.setsockopt( ZMQ_AFFINITY, 1 );
resp. s_SUB_recv.setsockopt( ZMQ_AFFINITY, ... );
set s_SUB_recv.setsockopt( ZMQ_MAXMSGSIZE, 32000 ); // protective ceiling
set s_SUB_recv.setsockopt( ZMQ_CONFLATE, True ); // retain just the last msg
set s_SUB_recv.setsockopt( ZMQ_LINGER, 0 ); // avoid blocking
set s_SUB_recv.setsockopt( ZMQ_TOS, anAppToS_NETWORK_PRIO_CODE );