2

I have recently started to use ZeroMQ and have observed that sometimes initialization of the ZeroMQ context can take as much as 40 seconds. Usually it's much faster than that.

Eg:

import zmq

# the next line will sometimes take 40 seconds
# or so but is usually near instantaneous
with zmq.Context.instance() as context:
    # connect / bind to socket, do stuff
    pass

This behaviour seems to be independent of whether or not I use a context manager, eg the following behaves the same as the above:

import zmq

# the next line will sometimes take 40 seconds
# or so but is usually near instantaneous
context = zmq.Context.instance()

# connect / bind to socket, do stuff

context.term()

This is running in Python 3.6.8 on Ubuntu 18.04.2

I haven't been able to establish any pattern to the slowdowns.

What are the possible causes of this behaviour?

Update

Per @user3666197 suggestion, I took some timings and managed to get the slowdown to manifest on the second run. Things returned to "normal" after that.

The slowdown kicked in on the second run of the test script:

 = zmq.Context()] took +      704 [us] to execute
 1.~                          704  us
 2.~                  131 766 816  us
 3.~                          145  us
 4.~                          137  us

Watching memory on htop at the time did not show memory filling up at all but swap was already active.

Update 2

Using a pruned-down version of @user3666197's suggested script:

import sys

import zmq
clock = zmq.Stopwatch()

clock.start()
foo = zmq.Context()
bar = clock.stop()
print("{0:>2s}: {1:>9d}".format(sys.argv[1], bar))
foo.term()

And calling it in a loop, eg:

for x in $(seq 1 20);
do python3 test.zmq.py $x;
done

I can repeat the issue on the original server but have yet to hit it on another server, which suggests an environmental issue (or something to do with the specific distro version and/or python version as the "good" server is older).

This is sufficient for me for now but I will come back to it later and gather enough info to submit something to the maintainers.

LRE
  • 956
  • 1
  • 10
  • 15
  • Would you @LRE mind to also post here a description ( result ) from [ lstopo-no-graphics -.ascii ]-command? An ordinary mouse-select and Ctrl-Ins [copy] / Shift-Ins [paste] from terminal works for this – user3666197 Feb 17 '20 at 00:50
  • 1
    @user3666197, that package doesn't seem to be on that box. I'll have to see about getting clearance to install it – LRE Feb 18 '20 at 01:03
  • The common name of the package is `hwloc` - yet, details depend on the package-manager ( `lstopo` being the core utility to call from the commandline, as you have already tested, haven't you ? + G/L with getting the clearance adminstration. This module is quite common in NUMA system maps, if that can help you get it done. ) – user3666197 Feb 18 '20 at 01:07

1 Answers1

2

In case one has never worked with ZeroMQ,
one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds"
before diving into further details


Q : " What are the possible causes of this ( only "sometimes" present ) behaviour? "
I haven't been able to establish any pattern to the slowdowns.

a ) those 40 seconds,
mentioned above, could make sense but with a RAM to swap-I/O operations. I cannot imagine any other reason for such block of time ( 've never noticed anything close to the 3/4 of a minute delay in ZeroMQ during my last ~10 years with ZeroMQ, never )

b ) the reality check
always measure the respective actual times of execution, before any claim get raised :
let's start to be systematic and fair on repeatable and documented test-cases, ok?
Categories like "sometimes" are not worth, data are. As a start, we may use the following templated approach, to discriminate the root cause of any actual delay :

M = "[{0:_>40s}] took +{1:_>9d} [us] to execute"
N = 0
from zmq import Stopwatch; aClk = Stopwatch()
#------------------------------
pass;                      aClk.start()
import zmq
pass;                  _ = aClk.stop(); print( M.format( "IMPORT", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context()
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context()", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
context = zmq.Context.instance()
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context.instance()", _ ) )
#------------------------------
pass;                      aClk.start()
context.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
with zmq.Context.instance() as aCtxInCtxMGR:
     N +=1
pass;                  _ = aClk.stop(); print( M.format( "with … as aCtxInCtxMGR:", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context(  0 )
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context(  0 )", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context( 10 )
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 10 )", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context( 20 )
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 20 )", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context( 50 )
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 50 )", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX = zmq.Context( 99 )
pass;                  _ = aClk.stop(); print( M.format( "= zmq.Context( 99 )", _ ) )
#------------------------------
pass;                      aClk.start()
aCTX.term()
pass;                  _ = aClk.stop(); print( M.format( ".term()", _ ) )
#------------------------------

et cetera
ad plausible


Your actual mileage may vary
( and will )

[__________________________________IMPORT] took +_____1198 [us] to execute
[_________________________= zmq.Context()] took +_____1523 [us] to execute
[________________= zmq.Context.instance()] took +_____1348 [us] to execute
[_________________with … as aCtxInCtxMGR:] took +_____7909 [us] to execute
[_____________________= zmq.Context(  0 )] took +______966 [us] to execute
[_____________________= zmq.Context( 10 )] took +______944 [us] to execute
[_____________________= zmq.Context( 20 )] took +______936 [us] to execute
[_____________________= zmq.Context( 50 )] took +______962 [us] to execute
[_____________________= zmq.Context( 99 )] took +_____1015 [us] to execute

[__________________________________IMPORT] took +______879 [us] to execute
[_________________________= zmq.Context()] took +_____1076 [us] to execute
[_________________________________.term()] took +______881 [us] to execute
[________________= zmq.Context.instance()] took +_____1105 [us] to execute
[_________________________________.term()] took +______896 [us] to execute
[_________________with … as aCtxInCtxMGR:] took +_____1474 [us] to execute
[_____________________= zmq.Context(  0 )] took +_____1163 [us] to execute
[_________________________________.term()] took +______982 [us] to execute
[_____________________= zmq.Context( 10 )] took +_____2646 [us] to execute
[_________________________________.term()] took +______873 [us] to execute
[_____________________= zmq.Context( 20 )] took +_____2689 [us] to execute
[_________________________________.term()] took +______981 [us] to execute
[_____________________= zmq.Context( 50 )] took +_____2772 [us] to execute
[_________________________________.term()] took +______910 [us] to execute
[_____________________= zmq.Context( 99 )] took +_____1086 [us] to execute
[_________________________________.term()] took +_____1010 [us] to execute
user3666197
  • 1
  • 6
  • 50
  • 92
  • 1
    The slowdown kicked in on the second run of the test script: ` = zmq.Context()] took + 704 [us] to execute`, test 1: 704us, test 2: 131766816us, test 3: 145us, test 4: 137us – LRE Feb 17 '20 at 00:05
  • @LRE Best print( zmq.zmq_version(), zmq.pyzmq_version() ), copy your site-specific details, including the garbage-collection policy and GIL-timing details and file an incident at module maintainers for resolving & cleaning the issue. – user3666197 Feb 17 '20 at 00:39
  • user3666197, by "module maintainers" are you referring to the maintainers for `pyzmq` or `ZeroMQ` itself? – LRE Feb 18 '20 at 01:10
  • pyzmq, obviously ( 've never met this with native zmq ) – user3666197 Feb 18 '20 at 01:35