90

I am working on a school project where I had to write a multi-threaded server, and now I am comparing it to apache by running some tests against it. I am using autobench to help with that, but after I run a few tests, or if I give it too high of a rate (around 600+) to make the connections, I get a "Too many open files" error.

After I am done with dealing with request, I always do a close() on the socket. I have tried to use the shutdown() function as well, but nothing seems to help. Any way around this?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Scott
  • 2,557
  • 5
  • 26
  • 32

13 Answers13

88

There are multiple places where Linux can have limits on the number of file descriptors you are allowed to open.

You can check the following:

cat /proc/sys/fs/file-max

That will give you the system wide limits of file descriptors.

On the shell level, this will tell you your personal limit:

ulimit -n

This can be changed in /etc/security/limits.conf - it's the nofile param.

However, if you're closing your sockets correctly, you shouldn't receive this unless you're opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
54

I had similar problem. Quick solution is :

ulimit -n 4096

explanation is as follows - each server connection is a file descriptor. In CentOS, Redhat and Fedora, probably others, file user limit is 1024 - no idea why. It can be easily seen when you type: ulimit -n

Note this has no much relation to system max files (/proc/sys/fs/file-max).

In my case it was problem with Redis, so I did:

ulimit -n 4096
redis-server -c xxxx

in your case instead of redis, you need to start your server.

Nick
  • 9,962
  • 4
  • 42
  • 80
  • 8
    And the answer to a memory leak is... buy more memory? No fix the file leak. – Rafael Baptista May 21 '13 at 15:56
  • 6
    Seems you do not understand the problem (or you place the comment under wrong answer?. It has to do with file descriptor limit, and nothing to do with memory or memory leak. – Nick May 21 '13 at 17:48
  • 2
    The file limit is 1024 because otherwise you run into a [fundamental problem with `select()`](http://beesbuzz.biz/blog/e/2013/10/10-the_problem_with_select_vs_poll.php). – fluffy Aug 08 '14 at 18:40
  • 3
    @RafaelBaptista High amount of concurrent connections is actually needed in some cases like for instance, a high performance chat server. This does not have to be about leaking FDs. – Antwan van Houdt Jun 03 '16 at 10:43
  • 1
    @RafaelBaptista: if you have a server that can handle more than 512 paraller connections you need A LOT MORE open files. Modern servers can handle multiple million parallel connections so having a limit as low as 1024 really does not make any sense. It might be okay for default limit for casual users but not for server software handling parallel client connections. – Mikko Rantalainen Apr 07 '20 at 07:06
  • However, most of the programs does not need more that 6-7 file descriptors. And if you have more than 1024, select() might have problem - suppose there is a web server with select(). then suppose it opens just 10 file descriptors, but one of them is with number after 1024. The server can not select() this descriptor in any way. Value of 1024 is hard coded and can not be changed easily. This is why they introduced poll() . Lated, epoll() and kqueue() were introduced for performance reasons, because poll() copy memory all the time. – Nick Nov 10 '20 at 06:05
  • can u please tell can this limit be changed automatically without reboot. I am using Ubantu linux kernel `5.11.0-37-generic`? – user786 Oct 11 '21 at 04:35
19

Use lsof -u `whoami` | wc -l to find how many open files the user has

Kokizzu
  • 24,974
  • 37
  • 137
  • 233
Edson Medina
  • 9,862
  • 3
  • 40
  • 51
17

TCP has a feature called "TIME_WAIT" that ensures connections are closed cleanly. It requires one end of the connection to stay listening for a while after the socket has been closed.

In a high-performance server, it's important that it's the clients who go into TIME_WAIT, not the server. Clients can afford to have a port open, whereas a busy server can rapidly run out of ports or have too many open FDs.

To achieve this, the server should never close the connection first -- it should always wait for the client to close it.

Ed4
  • 2,277
  • 18
  • 18
  • 2
    No. TCP TIME_WAIT will hold sockets open at the operating system level and eventually cause the server to reject incoming connections. When you close the file handle, its closed. http://stackoverflow.com/questions/1803566/what-is-the-cost-of-many-time-wait-on-the-server-side – Rafael Baptista May 21 '13 at 15:57
  • It's true that the file handle closes immediately and I misspoke. But my main point still stands, because even though the FD is freed, the TCP port remains allocated during TIME_WAIT, and a busy server can run out of TCP ports, or spend too much kernel memory tracking them. – Ed4 Apr 08 '15 at 17:10
16

This means that the maximum number of simultaneously open files.

Solved:

At the end of the file /etc/security/limits.conf you need to add the following lines:

* soft nofile 16384
* hard nofile 16384

In the current console from root (sudo does not work) to do:

ulimit -n 16384

Although this is optional, if it is possible to restart the server.

In /etc/nginx/nginx.conf file to register the new value worker_connections equal to 16384 divide by value worker_processes.

If not did ulimit -n 16384, need to reboot, then the problem will recede.

PS:

If after the repair is visible in the logs error accept() failed (24: Too many open files):

In the nginx configuration, propevia (for example):

worker_processes 2;

worker_rlimit_nofile 16384;

events {
  worker_connections 8192;
}
alfonx
  • 6,936
  • 2
  • 49
  • 58
shilovk
  • 11,718
  • 17
  • 75
  • 74
6

I had this problem too. You have a file handle leak. You can debug this by printing out a list of all the open file handles (on POSIX systems):

void showFDInfo()
{
   s32 numHandles = getdtablesize();

   for ( s32 i = 0; i < numHandles; i++ )
   {
      s32 fd_flags = fcntl( i, F_GETFD ); 
      if ( fd_flags == -1 ) continue;


      showFDInfo( i );
   }
}

void showFDInfo( s32 fd )
{
   char buf[256];

   s32 fd_flags = fcntl( fd, F_GETFD ); 
   if ( fd_flags == -1 ) return;

   s32 fl_flags = fcntl( fd, F_GETFL ); 
   if ( fl_flags == -1 ) return;

   char path[256];
   sprintf( path, "/proc/self/fd/%d", fd );

   memset( &buf[0], 0, 256 );
   ssize_t s = readlink( path, &buf[0], 256 );
   if ( s == -1 )
   {
        cerr << " (" << path << "): " << "not available";
        return;
   }
   cerr << fd << " (" << buf << "): ";

   if ( fd_flags & FD_CLOEXEC )  cerr << "cloexec ";

   // file status
   if ( fl_flags & O_APPEND   )  cerr << "append ";
   if ( fl_flags & O_NONBLOCK )  cerr << "nonblock ";

   // acc mode
   if ( fl_flags & O_RDONLY   )  cerr << "read-only ";
   if ( fl_flags & O_RDWR     )  cerr << "read-write ";
   if ( fl_flags & O_WRONLY   )  cerr << "write-only ";

   if ( fl_flags & O_DSYNC    )  cerr << "dsync ";
   if ( fl_flags & O_RSYNC    )  cerr << "rsync ";
   if ( fl_flags & O_SYNC     )  cerr << "sync ";

   struct flock fl;
   fl.l_type = F_WRLCK;
   fl.l_whence = 0;
   fl.l_start = 0;
   fl.l_len = 0;
   fcntl( fd, F_GETLK, &fl );
   if ( fl.l_type != F_UNLCK )
   {
      if ( fl.l_type == F_WRLCK )
         cerr << "write-locked";
      else
         cerr << "read-locked";
      cerr << "(pid:" << fl.l_pid << ") ";
   }
}

By dumping out all the open files you will quickly figure out where your file handle leak is.

If your server spawns subprocesses. E.g. if this is a 'fork' style server, or if you are spawning other processes ( e.g. via cgi ), you have to make sure to create your file handles with "cloexec" - both for real files and also sockets.

Without cloexec, every time you fork or spawn, all open file handles are cloned in the child process.

It is also really easy to fail to close network sockets - e.g. just abandoning them when the remote party disconnects. This will leak handles like crazy.

Rafael Baptista
  • 11,181
  • 5
  • 39
  • 59
5

On MacOS, show the limits:

launchctl limit maxfiles

Result like: maxfiles 256 1000

If the numbers (soft limit & hard limit) are too low, you have to set upper:

sudo launchctl limit maxfiles 65536 200000
letanthang
  • 390
  • 5
  • 7
4

it can take a bit of time before a closed socket is really freed up

lsof to list open files

cat /proc/sys/fs/file-max to see if there's a system limit

Partly Cloudy
  • 6,508
  • 3
  • 27
  • 16
3

For future reference, I ran into a similar problem; I was creating too many file descriptors (FDs) by creating too many files and sockets (on Unix OSs, everything is a FD). My solution was to increase FDs at runtime with setrlimit().

First I got the FD limits, with the following code:

// This goes somewhere in your code
struct rlimit rlim;

if (getrlimit(RLIMIT_NOFILE, &rlim) == 0) {
    std::cout << "Soft limit: " << rlim.rlim_cur << std::endl;
    std::cout << "Hard limit: " << rlim.rlim_max << std::endl;
} else {
    std::cout << "Unable to get file descriptor limits" << std::endl;
}

After running getrlimit(), I could confirm that on my system, the soft limit is 256 FDs, and the hard limit is infinite FDs (this is different depending on your distro and specs). Since I was creating > 300 FDs between files and sockets, my code was crashing.

In my case I couldn't decrease the number of FDs, so I decided to increase the FD soft limit instead, with this code:

// This goes somewhere in your code
struct rlimit rlim;

rlim.rlim_cur = NEW_SOFT_LIMIT;
rlim.rlim_max = NEW_HARD_LIMIT;

if (setrlimit(RLIMIT_NOFILE, &rlim) == -1) {
    std::cout << "Unable to set file descriptor limits" << std::endl;
}

Note that you can also get the number of FDs that you are using, and the source of these FDs, with this code.

Also you can find more information on gettrlimit() and setrlimit() here and here.

Jaime Ivan Cervantes
  • 3,579
  • 1
  • 40
  • 38
2

Just another information about CentOS. In this case, when using "systemctl" to launch process. You have to modify the system file ==> /usr/lib/systemd/system/processName.service .Had this line in the file :

LimitNOFILE=50000

And just reload your system conf :

systemctl daemon-reload
franck U
  • 21
  • 1
2

Similar issue on Ubuntu 18 on vsphere. The cause - Config file nginx.conf contains too many log files and sockets. Sockets are treated as files in Linux. When nginx -s reload or sudo service nginx start/restart, the Too many open files error appeared in error.log.

NGINX worker processes were launched by NGINX user. Ulimit (soft and hard) for nginx user was 65536. The ulimit and setting limits.conf did not work.

The rlimit setting in nginx.conf did not help either: worker_rlimit_nofile 65536;

The solution that worked was:

$ mkdir -p /etc/systemd/system/nginx.service.d
$ nano /etc/systemd/system/nginx.service.d/nginx.conf
    [Service]
    LimitNOFILE=30000
$ systemctl daemon-reload
$ systemctl restart nginx.service
1

I had the same problem and I wasn't bothering to check the return values of the close() calls. When I started checking the return value, the problem mysteriously vanished.

I can only assume an optimisation glitch of the compiler (gcc in my case), is assuming that close() calls are without side effects and can be omitted if their return values aren't used.

  • 4
    I'm sorry that is not plausible at all. If a very slight change in your code made the bug "go away", you most probably have a serious bug in your code that the change hid. Use `valgrind` or other such tools to track it down. A compiler optimizing away a `close` call would be catastrophic. – Mat May 21 '13 at 12:43
  • I agree. Checking the return value from any system call is important, though, because you could get `EGAIN` from many cases and if you ignore that, all bets are off. – Mikko Rantalainen Apr 07 '20 at 07:09
1

When your program has more open descriptors than the open files ulimit (ulimit -a will list this), the kernel will refuse to open any more file descriptors. Make sure you don't have any file descriptor leaks - for example, by running it for a while, then stopping and seeing if any extra fds are still open when it's idle - and if it's still a problem, change the nofile ulimit for your user in /etc/security/limits.conf

bdonlan
  • 224,562
  • 31
  • 268
  • 324