0

I'm new to the world of ZeroMQ and I'm working through the documentation of both NetMQ and ZeroMQ as I go. I'm currently implementing (or preparing to implement) the Paranoid Pirate Pattern, and hit a snag. I have a single app which is running the server(s), clients, and eventually queue, though I haven't implemented the queue yet. Right now, there should only be one server at a time running. I can launch as many clients as I like, all communicating with the single server. I am able to have my server "crash" and restart it (manually for now, automatically soon). That all works. Or at least, restarting the server works once.

To enforce that there's only a single server running, I have a thread (which I'll call the WatchThread) which opens a response socket that binds to an address and polls for messages. When the server dies, it signals its demise and the WatchThread decrements the count when it receives the signal. Here's the code snippet that is failing:

//This is the server's main loop:

        public void Start(object? count)
        {
            num = (int)(count ?? -1);
            _model.WriteMessage($"Server {num} up");
            var rng = new Random();
            using ResponseSocket server = new();

            server.Bind(tcpLocalhost); //This is for talking to the clients
                int cycles = 0;
            while (true)
            {
                var message = server.ReceiveFrameString();
                if (message == "Kill")
                {
                    server.SendFrame("Dying");
                    return;
                }
                if (cycles++ > 3 && rng.Next(0, 16) == 0)
                {
                    _model.WriteMessage($"Server {num} \"Crashing\"");
                    RequestSocket sock = new(); //This is for talking to the WatchThread
                    sock.Connect(WatchThreadString);
                    sock.SendFrame("Dying"); //This isn't working correctly
                    return;
                }
                if(cycles > 3 && rng.Next(0, 10) == 0)
                {
                    _model.WriteMessage($"Server {num}: Slowdown");
                    Thread.Sleep(1000);
                }
                server.SendFrame($"Server{num}: {message}");
            }

        }

And here's the WatchThread code:

    public const string WatchThreadString = "tcp://localhost:5000";

    private void WatchServers()
    {
        _watchThread = new ResponseSocket(WatchThreadString);


        _watchThread.ReceiveReady += OnWatchThreadOnReceiveReady;

        while (_listen)
        {
            bool result = _watchThread.Poll(TimeSpan.FromMilliseconds(1000));
        }
    }

    private void OnWatchThreadOnReceiveReady(object? s, NetMQSocketEventArgs a)
    {
        lock (_countLock)
        {
            ServerCount--;
        }

        _watchThread.ReceiveFrameBytes();
    }

As you can see, it's pretty straight forward. What am I missing? It seems like what should happen is exactly what happens the first time everything is instantiated: The server is supposed to go down, so it opens a new socket to the pre-existing WatchThread and sends a frame. The WatchThread receives the message and decrements the counter appropriately. It's only on the second server where things don't behave as expected...

Edit: I was able to get it to work by unbinding/closing _watchThread and recreating it... it's definitely suboptimal and it still seems like I'm missing something. It's almost as if for some reason I can only use that socket once, though I have other request sockets being used multiple times.

Additional Edit: My netstat output with 6 clients running (kubernetes is in my host file as 127.0.0.1 as is detailed here):

  TCP    127.0.0.1:5555         MyComputerName:0       LISTENING
  TCP    127.0.0.1:5555         kubernetes:64243       ESTABLISHED
  TCP    127.0.0.1:5555         kubernetes:64261       ESTABLISHED
  TCP    127.0.0.1:5555         kubernetes:64264       ESTABLISHED
  TCP    127.0.0.1:5555         kubernetes:64269       ESTABLISHED
  TCP    127.0.0.1:5555         kubernetes:64272       ESTABLISHED
  TCP    127.0.0.1:5555         kubernetes:64273       ESTABLISHED
Isaac
  • 334
  • 1
  • 4
  • 12
  • From cmd.exe >Netstat -a which will give status of connections. Check status when it fails. You cannot connect a 2nd time if the previous connect still exists. – jdweng Oct 04 '22 at 16:17
  • @jdweng Interesting... it's still showing 2 connections as established on that port and 1 listening. Unbinding the watchThread eliminates the listening connection, but doesn't affect the 2 established connections. It looks like my connection was still lingering... I've fixed that, but it still exhibits the same problematic behavior. – Isaac Oct 04 '22 at 17:10
  • You should also check client with netstat. The server has to be in a listening state when a new connection is made. A connection can have only one connection with same three properties 1) Source IP 2) Destination IP 3) Port number. – jdweng Oct 04 '22 at 17:47
  • @jdweng The server is always in a listening state... netstat confirms this. ZeroMq handles selecting a random (I assume unused) port for the foreign IP, or at least it seems to. I'm not sure what you mean by checking the client. – Isaac Oct 04 '22 at 21:54
  • Is client and server on the same machine or two different machines? The listener (server) only listens to one port, than client sends to same port. Some apps the server may open multiple port but that is not the norm. There are some protocols where there is a main command/control port and then data is sent on other ports. The port number for the other ports can be sent over the command/control channel. But again for simple communications this is not normally done. – jdweng Oct 04 '22 at 22:10
  • @jdweng The clients and server are all in the same process. It looks like (based on netstat) that multiple connections are made to the same port from different ports- the foreign IP is 127.0.0.1 in netstat and I have for instance 1 Listening connection and 6 Established connections. They're all connected to port 5555 in my case but connected from various ports. – Isaac Oct 05 '22 at 12:32
  • Only one connection is allowed with same three parameter 1) Source IP 2) Destination IP 3) Port number. When client and server is on same machine you have to make sure the client and server is using a different destination IP. So normally you have server listen on Loopback 127.0.0.1 and have client connect to the IP of the machine (or computer name) to prevent conflict. I do not understand why connection worked the first time. You have to be careful of using localhost because some machine localhost is configured for 127.0.0.1 and other it have the IP of the machine. – jdweng Oct 05 '22 at 17:15

0 Answers0