0

So here I am setting up a TCP connection to the server and is invoked from an application in a loop and sometimes I end up seeing the following error

select() timed out after 4 seconds - Operation now in progress

which means select did return 0 which means it timed out in 5 seconds without observing any activity on the file descriptor.

My understanding is nonblocking mode is set after connect() in case it doesn't connect right away with getsockopt() indicating whether connect() call did establish but for some reason, select seems to be returning 0. Does it have to with delay being too small?

int InitializeSocket(int sockType, int protocol, long timeout)
{
    int socketFd = socket(AF_INET, sockType, protocol);
    if (socketFd < 0)
    {
        perror ("Failed to create a client socket of type %d", sockType);
        return -1;
    }
    
    if (timeout > 0)
    {
        struct timeval sockTimeout = {.tv_sec = timeout, .tv_usec = 0};

        // setting the receive timeout
        if (setsockopt(socketFd, SOL_SOCKET, SO_RCVTIMEO, &sockTimeout, sizeof(sockTimeout)) < 0) 
        {
            perror ("Failed to set the RX timeout");
            return -1;
        }

        // setting the send timeout
        if (setsockopt(socketFd, SOL_SOCKET, SO_SNDTIMEO, &sockTimeout, sizeof(sockTimeout)) < 0) 
        {
            perror ("Failed to set the TX timeout");
            return -1;
        }
    }
    return socketFd;
}

void OpenTcpConnection(int serverTimeout, int port, const char *ipAddr)
{
    struct sockaddr_in *address
    
    int socketFd = InitializeSocket(SOCK_STREAM, 0, serverTimeout);
    if (socketFd == -1)
    {
        return -1;
    }
    
    address->sin_family = AF_INET;
    address->sin_port = htons(port);
    address->sin_addr.s_addr = inet_addr(ipAddr);
    memset(address->sin_zero, '\0', sizeof(address->sin_zero));
    
    // get the existing file flags
    long arg = 0;
    if( (arg = fcntl(socketFd, F_GETFL, NULL)) < 0) 
    { 
        perror ("Failed to get file status flags"); 
        exit(0); 
    } 

    // set the socket to nonblocking mode
    arg |= O_NONBLOCK; 
    if( fcntl(socketFd, F_SETFL, arg) < 0) 
    { 
        perror ("Failed to set to nonblocking mode");
        return -1;
    } 
    
    // connect to the server
    int res = connect(socketFd, (struct sockaddr *) &address, sizeof(address));

    fd_set fdset;
    struct timeval tv;
    long selectTimeout = 4; // connect() timeout

    if (res < 0) 
    { 
        // the socket is nonblocking & the connection cannot be completed immediately
        if (errno == EINPROGRESS) 
        { 
            do 
            { 
                tv.tv_sec = selectTimeout; 
                tv.tv_usec = 0; 
                FD_ZERO(&fdset); 
                FD_SET(socketFd, &fdset); 
                res = select(socketFd+1, NULL, &fdset, NULL, &tv); 
                
                if (res < 0 && errno != EINTR) 
                { 
                    perror ("Failed to monitor socket FD %d", socketFd);
                    return -1;
                } 
                else if (res > 0) 
                { 
                    int so_error;
                    socklen_t len = sizeof so_error;
                    int valopt; 

                    // check whether connect() completed successfully
                    if (getsockopt(socketFd, SOL_SOCKET, SO_ERROR, (void*)(&valopt), &len) < 0) 
                    { 
                        perror ("Error in getsockopt"); 
                        return -1;
                    } 
               
                    if (valopt) 
                    { 
                        perror ("Error in delayed connection");
                        return -1;
                    }
                    break;
                } 
                else
                {
                    perror ("select() timed out after %ld seconds", selectTimeout); // ERROR HERE !!!
                    return -1;
                }
            } while(1);
        }
    }
}
xyf
  • 664
  • 1
  • 6
  • 16
  • 1
    Note that it is not useful or appropriate to call `perror()` in the wake of `select()` returning anything other than -1. In any other case, the value of `errno` on which `perror()` will report is not reflective of the reason for `select()`'s return. – John Bollinger Jan 05 '22 at 20:10
  • Related: https://stackoverflow.com/a/17770524/2402272 – John Bollinger Jan 05 '22 at 20:24
  • 1
    Note that you are treating `EINTR` as a timeout instead of a retry. Change `if (res < 0 && errno != EINTR)` into `if (res < 0) { if (errno != EINTR) { ... } }` – Remy Lebeau Jan 05 '22 at 20:30

2 Answers2

1

Per the select() man page:

select() returns the number of ready descriptors that are contained in
the descriptor sets, or -1 if an error occurred.  If the time limit
expires, select() returns 0.

... so if select() is returning 0, it's because no I/O operations were completed before your timeout was reached.

As for why no I/O operations were completed: if you were waiting for a TCP connection to complete, then the most likely explanation is that the TCP connection hadn't completed yet (perhaps because of a slow, overloaded, or broken network or server?).

Another (less likely, but possible) explanation might be that you are running your program under Windows, and under Windows, if a non-blocking connect() fails, that failure is indicated by setting a bit in the exceptions fd_set (i.e. the one that you would pass in as the fourth argument to select(), just before the timeout-argument). In the posted code you are passing in NULL for that argument, which means that under Windows you would have no way of knowing when your non-blocking TCP connection attempt has failed. (under other OS's, a failed connection would cause the socket to select as ready-for-read and ready-for-write also, making a connection-failure easier to react to)

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • So just to reiterate: basically `Operation now in progress` still indicates the attempt to connect to the server is in progress and if it's taking longer than expected, it could be either the server info (ip/port) is incorrect or for some reason server is taking long enough to send an ACK back to the client? – xyf Jan 05 '22 at 20:31
  • This code cannot be running on Windows, as `fcntl()` doesn't exist on Windows. – Remy Lebeau Jan 05 '22 at 20:31
  • No, @xyf, the "Operation now in progress" message printed by your code in the case where a timeout is reported is not meaningful. You can rely on the value of `errno` only in the cases where the docs say you can -- typically when a function call has returned `-1` to indicate an error, as in the case of `select()`. – John Bollinger Jan 05 '22 at 20:35
  • Understood that `Operation now in progress` isn't really indicative of an error but rather indicates that `select` failed to find any activity on the said file descriptor, which should hint at `connect()` not getting a valid response from the server?; – xyf Jan 05 '22 at 20:42
  • 1
    @xyf `errno` was set by the `connect()` call when `connect()` returned -1; it was left unchanged by `select()` since `select()` never returned -1. So the error-string printed out by `perror()` in this case is unrelated to `select()`'s behavior. – Jeremy Friesner Jan 05 '22 at 20:56
  • right on, so the message `Operation now in progress` is indeed a result of `connect()` call. But I was just trying to understand the flow and whether that makes sense which was: `Operation now in progress` indicating the connection to the server is still pending which could be due to incorrect server info (ip/port). And I reckon it seems to be the case for me – xyf Jan 06 '22 at 01:48
  • In general your computer can’t know whether a given IP/port is correct or not; it can only send out packets and wait to see what kind of responses (if any) come back. – Jeremy Friesner Jan 06 '22 at 02:25
0

My understanding is nonblocking mode is set after connect() in case it doesn't connect right away with getsockopt() indicating whether connect() call did establish but for some reason, select seems to be returning 0. Does it have to with delay being too small?

nonblocking had to be set after the socket(2) call, and before the connect(2) call, or the connect would be blocked (not reaching the select() call) until the connect(2) fails. This is normally over two minutes, and this trick is done to wait only 5s. in the connect call.

A 5 sec delay is normally small for a remote connection of a remote site. In a lan, if you don't get connected in 5s. then it means something is wrong.

My bet is that something is wrong, you are trying to connect to a socket that is not available (non-existent host, check that the server is listening in the address:port you are trying to connect to), you have forgot to convert into net byte order some fields in the sockaddr_in structure (this appears to be correct in your snippet) or a firewall is blocking you from connecting (this can be the thing). You are waiting for the socketFd to be available for writing, which is correct, as it wouldn't be (and block) if the connection is not connected first, so apparently you are doing things correctly, so some address has been mispelled or a firewall is cutting the access to the server.

Either way, a timeout in select is not an error, but just a timeout. The software your are using is considering a timeout of 5s. fatal in a socket, so you have to ask the developer or check your network connection.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • Also for UDP, I see it transmits the data fine but it's `recvfrom` is timing out with an error `Resource temporarily unavailable` . Do you see why? If it can transmit fine, the server settings must be fine as well – xyf Jan 08 '22 at 06:00