2

I have written a C program which needs will act as a primitive server, and I need help with a socket/resource allocation problem.

The main() function opens a single listening socket, calls accept() for every incoming connection, then spins up a detached thread to handle the actually processing of the client request:

typedef struct {
        int sock;
        struct sockaddr address;
        int addr_len;
} connection_t;


int main(int argc, char ** argv)
{
        int             sock = -1;
        struct          sockaddr_in address;
        int             port = 12345;
        connection_t*   connection;
        pthread_t       thread;
        int             cnt = 0;

        signal(SIGINT, handle_sigint);

        // Create the listening socket
        sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
        if (sock <= 0){
                fprintf(stderr, "%s: error: cannot create socket\n", argv[0]);
                return -3;
        }

        // Bind socket to port
        address.sin_family = AF_INET;
        address.sin_addr.s_addr = INADDR_ANY;
        address.sin_port = htons(port);
        if (bind(sock, (struct sockaddr *)&address, sizeof(struct sockaddr_in)) < 0){
                fprintf(stderr, "%s: error: cannot bind socket to port %d\n", argv[0], port);
                return -4;
        }

        // Listen on the port
        if (listen(sock, 5) < 0){
                fprintf(stderr, "%s: error: cannot listen on port\n", argv[0]);
                return -5;
        }

        printf( "...starting loop...\n" );
        while (1){
                // Accept incoming connections
                connection = (connection_t *)malloc(sizeof(connection_t));
                connection->addr_len = 20;
                connection->sock = accept(sock, &connection->address, &connection->addr_len);

                if (connection->sock <= 0){
                        printf("Connection FAILED :: Value of errno: %d\n ", errno);
                        free(connection);
                }
                else{
                        // Start a new thread but do not wait for it
                        pthread_create(&thread, 0, process, (void *)connection);
                        pthread_detach(thread);
                }
        }
        return 0;
}

The above should be C Network Programming 101. And it does work great… until I start sending large number of client requests. When I do, I see this:

me@mylinux:/home/me/myServer
...starting loop...
 Connection FAILED :: Value of errno: 24
 Connection FAILED :: Value of errno: 24
 Connection FAILED :: Value of errno: 24
 Connection FAILED :: Value of errno: 24

Hmm. After working like a champ for a bunch of earlier connections, accept() is suddenly choking.

A quick Google search reveals that Errno 24 is EMFILE: “Too many open files.” And a search of other StackOverflow posts (here) suggests that each time my code calls accept(), this creates a file descriptor for the new open connection. This means each new thread lays claim to a new file descriptor.

I'm guessing that the problem is that the threads do not clean up after themselves. They read from the socket, do their work, then terminate without informing the system that their FD is no longer needed. Thus, the system maintains all those FDs and I get an avalanche of Errno 24s after the first 4,096 connections.

(I've run “ulimit -n 4096” on my Linux box.)

So here’s the process() function that guides each thread; I’ve omitted the string-parsing part because its out-of-scope here. Note that after the thread does its work, it tries to clean up after itself:

void * process(void * ptr)
{
        char * buffer;
        int len;
        connection_t * conn;          // See above for this data type
        long addr = 0;

        conn = (connection_t *)ptr;

        // read length of message
        read(conn->sock, &len, sizeof(int));
        if (len > 0)
        {
                addr = (long)((struct sockaddr_in *)&conn->address)->sin_addr.s_addr;
                buffer = (char *)malloc((len+1)*sizeof(char));
                buffer[len] = 0;

                // Read the message from the socket
                read(conn->sock, buffer, len);

                // Parse client message
                parseCliMsg( buffer, len );

                free(buffer);
        }

        // Close socket and clean up
        close(conn->sock);
        free(conn);
        pthread_exit(0);
}

Does anyone see where I’m going wrong?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Pete
  • 1,511
  • 2
  • 26
  • 49
  • 2
    No way to be sure without more info. But it does look like you're cleaning up the file descriptor correctly at the end of the thread. My guess therefore is that maybe those threads are still running? Still waiting in one of the `read` calls? Or `parseCliMsg` hangs somewhere? It would help if you collect and check return values. How many bytes are you reading at each step? Is `close` returning an error? Maybe the connection structure got corrupted and `close` is failing. Also, there's no `myNum` in your connection structure, so it's unclear if we're seeing what your code is actually doing. – Gil Hamilton Oct 07 '19 at 20:01
  • @GilHamilton Thanks Gil. I'm at the end of my work day, but will work to clarify your questions. You make great points. FYI, myNum is an int that's in the connecion_t struct. But because it wasn't relevant to this example, I edited it out. Obviously I missed one instance; will clean that up! – Pete Oct 07 '19 at 20:38
  • 1
    Consider using lsof (/usr/sbin/lsof -p PID) to get a list of open file descriptor, when the program start spitting out the 'Error 24'. – dash-o Oct 07 '19 at 20:49
  • 3
    You're never checking the return value from `read()` so you really don't know what's happening in your threads. – Andrew Henle Oct 07 '19 at 21:26
  • @GilHamilton On your suggestion, I disabled my parseCliMsg() function, and I'll be danged if my code didn't run flawlessly for over an hour. It processed over a million client requests without so much as a hiccup. So to my astonishment, the problem lies there. If you write your response as a formal answer, I'll mark it as the solution... because you were right! – Pete Oct 08 '19 at 17:33
  • @dash-o Ah, great idea. I had to google that command, nice to include it in my toolbelt. Thanks! – Pete Oct 08 '19 at 17:34
  • @AndrewHenle Thanks Andrew. You're right, I should be checking the retval of read() and all those other functions. That's the flaw of nabbing example code you find in a google search, I guess. Many thanks! – Pete Oct 08 '19 at 17:35

0 Answers0