2

I am using non-blocking sockets to connect to a server.
In a specific test scenario, the server is down, which means a TCP SYN goes out, but there is no response and there can never be an established connection.

In this setup, usually select times out after 2 seconds returning 0. This is the behavior most of the time and it seems correct.

However, in roughly 5% of the cases, select immediately returns 1 (indicating the socket is readable in the mask).
But when I read(2) from the socket, -1 is returned with 'Network is unreachable'

sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking

struct timeval tv{};
tv.tv_sec = 2;

int ret = connect(sockfd, addr, addrlen ); // addr set elsewhere
if (ret < 0 && errno == EINPROGRESS)
{
    fd_set cset;
    FD_ZERO(&cset);
    FD_SET(sockfd, &cset);
    
    ret = select(sockfd + 1, &cset, nullptr, nullptr, &tv);
    // returns 1 sometimes
}

In the first post, I incorrectly stated that in the error case, there is only one TCP SYN on the network (without retries).
This is not true; in both the error and non-error case, there is a TCP SYN on the network that is re-sent after 1 second.

What might cause this and is there a way to get consistent behavior with select ?

curiousguy12
  • 1,741
  • 1
  • 10
  • 15
  • it's set non-blocking, but I did not include it in the code (there's comment though) – curiousguy12 May 31 '21 at 17:41
  • 1
    [Does this address your question?](https://stackoverflow.com/questions/8417821/non-blocking-socket-select-returns-1-after-connect) – ryyker May 31 '21 at 17:41
  • [Similar here](https://stackoverflow.com/a/5843852/645128) (also linked in above link) – ryyker May 31 '21 at 17:44
  • 1
    @ryyker The question is related, but it's a different scenario. Apparently TCP RST is returned which makes the socket readable.. In my case, there is no response from the network – curiousguy12 May 31 '21 at 17:44
  • 3
    Have you considered `select`ing for *writing* instead of reading? Does that change anything? – Marco Bonelli May 31 '21 at 17:47
  • "there is no response from the network" how do you know? Have you snooped on the wire? All packets or just TCP? – n. m. could be an AI May 31 '21 at 17:48
  • @n.'pronouns'm.Yes, I ran a `tcpdump` with only the destination IP in the filter. TCP SYN going out and nothing else – curiousguy12 May 31 '21 at 17:50
  • @MarcoBonelli Just tried _writing_ instead of _reading_, the result is the same – curiousguy12 May 31 '21 at 17:53
  • The "network unreachable" reply doesn't come from the destination IP. It is unreachable! – n. m. could be an AI May 31 '21 at 17:54
  • @n.'pronouns'm. well, it's not that it "comes" from anywhere other than [the kernel itself](https://elixir.bootlin.com/linux/v5.4/source/net/ipv4/tcp_ipv4.c#L238) though, doubt you'd see anything on the wire. – Marco Bonelli May 31 '21 at 17:58
  • The first question ryyker refers to is about a client connecting to an IP with no port listening. That is when the server returns a TCP/RST which leads to the observed behavior in that question. In my case, the remote IP is non-existent – curiousguy12 May 31 '21 at 18:02
  • @MarcoBonelli You are likely to see an ICMP packet from the gateway, because your host probably doesn't do any routing. – n. m. could be an AI May 31 '21 at 18:08
  • @n.'pronouns'm. oh, so you're saying that the host receives a "Destination Unreachable" ICMP and that the kernel marks the socket as ready for I/O after seeing that? That's interesting... – Marco Bonelli May 31 '21 at 18:11
  • @MarcoBonelli Yeah I think so, what would be other reasons? – n. m. could be an AI May 31 '21 at 18:13

1 Answers1

4

The correct way to determine if a non-blocking connect() is finished is to ask select() for writability not readability. This is clearly stated in the connect() documentation:

EINPROGRESS
The socket is nonblocking and the connection cannot be completed immediately. (UNIX domain sockets failed with EAGAIN instead.) It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).

It is undefined behavior to use select()/poll() to test a socket for readability before you know the connection has actually been established first.

Try this instead:

sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking

int ret = connect(sockfd, addr, addrlen); // addr set elsewhere
if (ret < 0)
{
    if (errno != EINPROGRESS)
    {
        close(sockfd);
        sockfd = -1;
    }
    else
    {
        fd_set cset;
        FD_ZERO(&cset);
        FD_SET(sockfd, &cset);
    
        struct timeval tv{};
        tv.tv_sec = 2;

        ret = select(sockfd + 1, nullptr, &cset, nullptr, &tv);
        if (ret <= 0)
        {
            close(sockfd);
            sockfd = -1;
        }
        else
        {
            int errCode = 0;
            socklen_t len = sizeof(errCode);
            getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &errCode, &len);

            if (errCode != 0)
            {
                close(sockfd);
                sockfd = -1;
            }
        }
    }
}

if (sockfd != -1)
{
    // use sockfd as needed (read(), etc) ...
    close(sockfd);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Well of course you can test for readability, it will just indicate readability (i.e. data received from the peer) rather than a successfully established connection. – n. m. could be an AI May 31 '21 at 19:28
  • Thanks, I changed the code per your suggestions.. `getsockopt` still returns `network unreachable` in about 5% of the connection attempts. The strange thing is that even with that error, I can see the TCP SYN go out on the network. I can try `ppoll`, but if this is a message from the kernel, it may trigger the same behavior – curiousguy12 May 31 '21 at 19:43
  • @curiousguy12 I never said a "network unreachable" error wouldn't happen. But you can't be sure the error is available until the socket reports writability, not readability. – Remy Lebeau May 31 '21 at 19:57