4

So, I have a client that attempts to connect with a server. The ip and port are retrieved from a configuration file. I need the program to fail smoothly if something in the config file is incorrect. I connect to the server using the following code

if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
    perror("client: connect");
    close(sockfd);
    continue;
}

If the user attempts to connect to a server on the subnet that is not accepting connections (i.e. is not present), then the program fails with No route to host. If the program attempts to connect to a server that is not on the subnet (i.e. the configuration is bad), then the program hangs at the connect() call. What am I doing incorrectly? I need this to provide some feedback to the user that the application has failed.

cirrusio
  • 580
  • 5
  • 28
  • Be careful, that if you close the socket, and then do a `continue;` probably you'll get again to `connect(2)` with the socket closed. If the socket is closed, de descriptor in `sockfd` is invalid. And please, read [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) It is very important you post compilable and verifiable code (testable, not snippets) It's impossible to know what you are doing incorrectly, because your snippet of code is completely correct. The problem you have is elsewhere in the hidden code. – Luis Colorado Aug 25 '18 at 14:40

2 Answers2

6

You're not doing anything wrong. TCP is designed for reliability in the face of network problems, so if it doesn't get a response to its initial connection request, it retries several times in case the request or response were lost in the network. The default parameters on Linux result in it taking about a minute to give up. Then it will report a failure with the Connection timed out error.

If you want to detect the failure more quickly, see C: socket connection timeout

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    Yes thanks very good point. Yes,we all experience TCP/IP timeouts - but he did say hangs - but not for how long. Chances are you are spot on. One of the best payoffs for answering questions is what you learn yourself (I mean me!). – bcperth Aug 24 '18 at 01:37
  • I know he said hang, but he probably just didn't wait long enough. After 10 seconds, it will seem like it's hung. – Barmar Aug 24 '18 at 13:48
  • 1
    Looks like it times out after about 3 minutes. I will take a look at the link you sent. Thanks! – cirrusio Aug 24 '18 at 15:18
  • Is this default timeout for connect varies amoung different systems? Is there any standard for this? – abhiarora Dec 15 '19 at 20:53
  • @abhiarora It's a kernel configuration option, I think 3 minutes is typical. I don't know if there's a standard. – Barmar Dec 15 '19 at 21:25
1

Normally we don't use continue inside an if statement, unless the if statement is inside a loop, that you are not showing. Assuming there is an outer loop, this would be responsible for what happens next .. either keeps re-entering the if block ( to try to connect again) or skipping past it.

Note also you are closing sockfd inside the if block so if your loop is re-entering the if block to do retries, then it needs to create a new socket first.

I suggest reading some sample code for client and server side socket connections to get a better feel for how it works http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html

If all fails, please provide the code around the if block and also state how you want to "fail smoothly". One way to fail "abruptly' would be to swap the continue statement with a call to exit() "-)

EDIT: After reading Barmar's answer and his comment you also need to be aware of this:

If the initiating socket is connection-mode, then connect() shall attempt to establish a connection to the address specified by the address argument. If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block for up to an unspecified timeout interval until the connection is established. If the timeout interval expires before the connection is established, connect() shall fail and the connection attempt shall be aborted.

also..

If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously. Subsequent calls to connect() for the same socket, before the connection is established, shall fail and set errno to [EALREADY]

When you say "the program hangs" did you mean forever or for a period that might be explained by a TCP/IP timeout.

If this and Barmar's answer are still not enough, then it would help to see the surrounding code as suggested and determine if blocked or non-blocked etc.

bcperth
  • 2,191
  • 1
  • 10
  • 16
  • Thanks for your reply. Sorry, this section is basically cut and paste from [Beej's guide](https://beej.us/guide/bgnet/), but your points are valid. I am just doing some initial tests now so I haven't really changed anything. `O_NONBLOCK` is not explicitly set so this should be blocking. As stated in the original post, `connect()` does not return for several minutes. Looks like @Barmar was spot on. Thanks for your help. – cirrusio Aug 24 '18 at 15:16
  • OK...so I should say that the `continue` is here because we are looping through the linked list returned by `addrinfo` in the function `getaddrinfo()`. If there is an error in this `connect()` it will skip a `break` statement in the `for` loop. Probably a better way to do this but currently the least of my concerns. Thanks for pointing this out. – cirrusio Aug 24 '18 at 15:22
  • All good! As a matter of interest are you now fully able to explain the different behaviour between connecting to a server on the subnet that is not accepting connections (immediate fail) and to a server that is not on the subnet (hangs)? I will read the Beej's code later when I get some time..... but probably never reaches the connect() in former case? – bcperth Aug 24 '18 at 23:18