0

I do apologize if this question has been asked before. I am writing a non-blocking socket client using select multiplexing. One thing that confuses me is the non-blocking connect always succeeds regardless of server being online or offline. I searched many posts and followed their solutions but none of them work on my linux ubuntu machine.

static void callback_on_select_write(int connect_fd) {

  // Client write event arrived;

  int error = -1;
  socklen_t len = sizeof(error);

  if(getsockopt(connect_fd, SOL_SOCKET, SO_ERROR, &error, &len) == -1) {
    return;
  }

  // getsockopt puts the errno value for connect into erro so 0 means no-error. 
  if(error == 0) {
      // Connection ok.
  }
  else {
    cerr << "Failed to connect\n";
    return;
  }

  // Ready to write/read

}

Everytime the select returns and invokes this callback which always succeeds, i.e., going to the "Ready to write/read" block, instead of cerring failure. Why can this happen? How do I design a portable mechanism to detect if the connection is really successful? Below is the way I create the connector.

int make_socket_client(const ::std::string& hostname, const ::std::string& port) {

  struct addrinfo hints;
  struct addrinfo* res {nullptr};
  struct addrinfo* ptr {nullptr};

  memset(&hints, 0, sizeof(struct addrinfo));
  hints.ai_family = AF_UNSPEC;
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_protocol = IPPROTO_TCP;

  int rv;
  int connector;

  if((rv = getaddrinfo(hostname.c_str(), port.c_str(), &hints, &res)) != 0) {
    return -1;
  }

  // Try to get the first available client connection.
  for(ptr = res; ptr != nullptr; ptr = ptr->ai_next) {

    // Ignore undefined ip type.
    if(ptr->ai_family != AF_INET && ptr->ai_family != AF_INET6) {
      continue;
    }

    // Create a listener socket and bind it to the localhost as the server.
    if((connector = socket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol)) == -1){
      continue;
    }

    make_fd_nonblocking(connector);

    if(connect(connector, (struct sockaddr*)ptr->ai_addr, ptr->ai_addrlen) < 0) {
      // This is what we expect.
      if(errno == EINPROGRESS) {
        break;
      }
      else {
        close(connector);
        continue;
      }
    }
    else {
      break;
    }
  }

  freeaddrinfo(res);

  if(ptr == nullptr) {
    return -1;
  }

  return connector;  
}
Jes
  • 2,614
  • 4
  • 25
  • 45
  • Surely you should be looking at the SO_ERROR on `connect_fd`? And what is `event`? This code is not complete enough to answer a question on. – user207421 Mar 27 '16 at 23:13
  • Sorry. it should be connect_fd only. – Jes Mar 27 '16 at 23:16
  • Did the original `connect()` return zero? – user207421 Mar 27 '16 at 23:19
  • I posted another code on how I create the non-blocking connector. – Jes Mar 27 '16 at 23:20
  • It always returns valid connector with errno == EINPROGRESS. Then I use select to figure out when it is writable. – Jes Mar 27 '16 at 23:21
  • I don't know what a 'valid connector' is. Do you mean it always returns -1 with `errno == EINPROGRESS`? – user207421 Mar 27 '16 at 23:22
  • Yes. The connect always return -1 with `EINPROGRESS` What I meant is the connector after socket call is valid – Jes Mar 27 '16 at 23:24
  • Well back in the day before SO_ERROR became ubiquitous we used to issue a second connect after the socket became writabe, which would either succeed or fail. BUt really I find little use for a non-blocking connect. I always do it in blocking mode, and then use `select()` just to control the I/O. Actually I find little use for client-side non-blocking mode at all. – user207421 Mar 27 '16 at 23:32
  • @EJP how do you handle it when the user wants to quit the client (right now), but a client thread is blocked inside connect(), for potentially a long time? – Jeremy Friesner Mar 28 '16 at 01:40
  • It's not 'potentially a long time', only about a minute, and that only happens on a connect that is going to timeout, and that shouldn't happen at all unless there is a misconfiguration. Connection refusals are very quick. – user207421 Mar 28 '16 at 04:10

1 Answers1

2

Everytime the select returns and invokes this callback which always succeeds, i.e., going to the "Ready to write/read" block, instead of cerring failure. Why can this happen?

While the asynchronous TCP connect is in progress (as indicated by -1/EINPROGRESS from the connect() call), you should pass the socket to select() as part of its ready-for-write socket set, so that select() will return when the socket indicates it is ready-for-write.

When the TCP connection succeeds-or-fails, select() will return that the socket is ready-for-write(*). When that happens, you need to figure out which of the two possible outcomes (success or failure) has occurred.

Below is the function I call when an asynchronously-connecting socket select()'s as ready-for-write.

// call this select() has indicated that (fd) is ready-for-write because
// (fd)'s asynchronous-TCP connection has either succeeded or failed.
// Returns true if the connection succeeded, false if the connection failed.
// If this returns true, you can then continue using (fd) as a normal
// connected/non-blocking TCP socket.  If this returns false, you should
// close(fd) because the connection failed.
bool FinalizeAsyncConnect(int fd)
{
#if defined(__FreeBSD__) || defined(BSD)
   // Special case for FreeBSD7, for which send() doesn't do the trick
   struct sockaddr_in junk;
   socklen_t length = sizeof(junk);
   memset(&junk, 0, sizeof(junk));
   return (getpeername(fd, (struct sockaddr *)&junk, &length) == 0);
#else
   // For most platforms, the code below is all we need
   char junk;
   return (send(fd, &junk, 0, 0L) == 0);
#endif
}

(*) Side note: Things are slightly different under Windows, because Windows likes to do things its own way: Under Windows, a successful asynchronous connect() is indicated as described above, but if you want to be notified about a failed asynchronous connect() attempt under Windows, you need to register your socket under the "except" fd_set also, as it is the "except" fd_set that Windows will use to communicate a failed asynchronous connect().

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • Thank you Jeremy. However, in my OSX system, calling send with peer being offline gives me `SIGPIPE` error and the program exits, and the function `FinalizeAsyncConnect` never returns. How can I get this resolved? – Jes Mar 28 '16 at 02:31
  • 1
    Disable SIGPIPE by calling signal(SIGPIPE, SIG_IGN) at the top of main(), as described here: http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly/108192#108192 – Jeremy Friesner Mar 28 '16 at 03:07