If getaddrinfo fails once, it fails forever (even after network is ready)

Question

I am writing a C application which is run as a systemd service on boot (distro: Arch Linux) and which shall connect to a server. Because the application is run on boot it eventually happens that the network connection is not yet established. This naturally leads to a failure of the first function which requires one, which in my case is getaddrinfo.

So I thought that I would just write a loop which repeatedly calls getaddrinfo until it suceeds once the network is ready. Unfortunately I found that getaddrinfo keeps failing with name or service not known even after the connection was established.

I am able to ping the server by its hostname but getaddrinfo still won't do it. If I stop the application and run it again, everything works fine. If the network connection is already established before the first call, getaddrinfo works fine too.

Apparently, if getaddrinfo failed once because the network was not ready, it will fail forever. It seems not to be aware of the now existing connection. When using the deprecated gethostbyname, the behaviour is the same.

What is the reason for this behaviour? Is there a way to force getaddrinfo to refresh internal variables (if such exist) or similar which might explain why the function still believes that there is no connection? Is there another function which I should call previously in order to check whether the network is ready?

I would like to avoid a delay which waits for some time, expecting the network to be connected afterwards. I would also prefer to check for a connection from within my application and not have a bash script first check for it and then start the application.

score 7 · Accepted Answer · answered May 20 '15 at 23:57

You can understand the answer by compiling the following test program, and following the instructions below:

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    while (1)
    {
        struct addrinfo *res;
        int rc=getaddrinfo(argv[1], "http", NULL, &res);

        printf("getaddrinfo returned %d\n", rc);

        if (rc == 0)
            freeaddrinfo(res);

        sleep(1);
    }
}

Before you run this test program:

Connect to the network.
Rename, temporarily /etc/resolv.conf to /etc/resolv.conf.save.
Start this test program, using a good hostname.
Soon after the test program starts, and starts printing error codes, rename /etc/resolv.conf.save to /etc/resolv.conf.
Observe that the test program is still reporting DNS resolution failures.
If you CTRL-C and restart it, though, the test program will now report valid DNS resolution.

When you disconnect and reconnect from the network, your network stack rewrites and updates /etc/resolv.conf accordingly. This configuration file is needed by the DNS resolver in the C library. The C library reads the DNS configuration from /etc/resolv.conf the first time, and caches it. It doesn't check, with every lookup, if the contents of /etc/resolv.conf have changed.

Finally:

Your homework assignment is to add a call to res_init(), defined in resolv.h, to this test program, read the corresponding man page, and see what happens. That's your answer.

That actually works sometimes (but rather seldom). I assume that it works if the network connection is already established, but `resolv.conf` has not been updated yet when the application is started. Once `resolv.conf` is updated, the call to `res_init` in the loop loads the new configuration and then `getaddrinfo` suceeds. But if the network connection is not ready on application start, it still keeps failing, — kassiopeia, May 21 '15 at 11:39
My step-by-step demo involves starting the application when no DNS resolution is configured at all, which would be the case when no network connections exist. Did you try starting the demo code with the call to rs_init() without a network connection, and then connecting to a network? This short little demo code is quite easy to test. There's no need to assume anything. — Sam Varshavchik, May 21 '15 at 23:55
When I first used your test program I only renamed `resolv.conf` but did not try to disconnect and reconnect the network. Now I tried it and it works. In my application I immediately tried to use `res_ninit` as the man pages told me that the other function is deprecated. After reverting back to `res_init` this also works fine. I probably overlooked something concerning the initialization of the `__res_state` structure. I will go over this again and see how I can make `res_ninit` behave the same as `res_init`. — kassiopeia, May 22 '15 at 08:12

If getaddrinfo fails once, it fails forever (even after network is ready)

1 Answers1