7

It seems that the server is limited at ~32720 sockets... I have tried every known variable change to raise up this limit. But the server stay limited at 32720 opened socket, even if there is still 4Go of free memory and 80% of idle cpu...

Here's the configuration

~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63931
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 798621
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 2048
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63931
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

net.netfilter.nf_conntrack_max = 999999
net.ipv4.netfilter.ip_conntrack_max = 999999
net.nf_conntrack_max = 999999

Any thoughts ?

TheSquad
  • 7,385
  • 8
  • 40
  • 79
  • Just so it's said: If you need more than 32000 sockets at once, you have bigger problems than just that number being too low. A normal server doesn't ever have more than a few hundred sockets (maybe even a couple thousand, for a busy server) open at once. – cHao Aug 07 '10 at 12:52
  • 1
    few hundred sockets ? from where did you get that number ? – TheSquad Aug 07 '10 at 13:24
  • @TheSquad: do you have some security framework loaded, that limits the number of fd's and/or connections? – mvds Aug 07 '10 at 13:39
  • Experience. Even extremely busy web sites rarely serve more than a couple thousand simultaneous clients -- once they get to that point, they're clustered or otherwise distributed to reduce load. And the QuakeNet IRC network, the best example i could think of for mass long-lived TCP client/server stuff, has maybe 80k simultaneous users spread over 40+ servers. That's about 2k per. – cHao Aug 07 '10 at 13:44
  • @mvds: The limit is most likely not due to security stuff -- security would kick in WAY before 32k sockets. – cHao Aug 07 '10 at 13:45
  • @cHao : it is not a web server, and IRC server is way more eating performance than the software we made. With 2K client connected to a IRC server lets say that cpu usage is close to 100% and I don't talk about memory... – TheSquad Aug 07 '10 at 14:04
  • @mvds : no, no security framework loaded – TheSquad Aug 07 '10 at 14:18
  • Ya know, if we don't count the obvious flaw in a program that needs so many sockets, then this becomes an admin issue. Voting to move to serverfault. – cHao Aug 07 '10 at 16:13
  • Would you care to explain how you are testing this? Where does this number of connections come from? Testing on the same box? Between two machines? Any errors on server or client? – Nikolai Fetissov Aug 09 '10 at 02:19
  • [How many socket connections possible?](https://stackoverflow.com/q/651665/608639) – jww Nov 18 '19 at 02:40

8 Answers8

7

If you're dealing with openssl and threads, go check your /proc/sys/vm/max_map_count and try to raise it.

fedj
  • 3,452
  • 1
  • 22
  • 21
4

In IPV4, the TCP layer has 16 bits for the destination port, and 16 bits for the source port.

see http://en.wikipedia.org/wiki/Transmission_Control_Protocol

Seeing that your limit is 32K I would expect that you are actually seeing the limit of outbound TCP connections you can make. You should be able to get a max of 65K sockets (this would be the protocol limit). This is the limit for total number of named connections. Fortunately, binding a port for incoming connections only uses 1. But if you are trying to test the number of connections from the same machine, you can only have 65K total outgoing connections (for TCP). To test the amount of incoming connections, you will need multiple computers.

Note: you can call socket(AF_INET,...) up to the number of file descriptors available, but you cannot bind them without increasing the number of ports available. To increase the range, do this:

echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range (cat it to see what you currently have--the default is 32768 to 61000)

Perhaps it is time for a new TCP like protocol that will allow 32 bits for the source and dest ports? But how many applications really need more than 65 thousand outbound connections?

The following will allow 100,000 incoming connections on linux mint 16 (64 bit) (you must run it as root to set the limits)

#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/ip.h>

void ShowLimit()
{
   rlimit lim;
   int err=getrlimit(RLIMIT_NOFILE,&lim);
   printf("%1d limit: %1ld,%1ld\n",err,lim.rlim_cur,lim.rlim_max);
}

main()
{
   ShowLimit();

   rlimit lim;
   lim.rlim_cur=100000;
   lim.rlim_max=100000;
   int err=setrlimit(RLIMIT_NOFILE,&lim);
   printf("set returned %1d\n",err);

   ShowLimit();

   int sock=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
   sockaddr_in maddr;
   maddr.sin_family=AF_INET;
   maddr.sin_port=htons(80);
   maddr.sin_addr.s_addr=INADDR_ANY;

   err=bind(sock,(sockaddr *) &maddr, sizeof(maddr));

   err=listen(sock,1024);

   int sockets=0;
   while(true)
   {
      sockaddr_in raddr;
      socklen_t rlen=sizeof(raddr);
      err=accept(sock,(sockaddr *) &raddr,&rlen);
      if(err>=0)
      {
        ++sockets;
        printf("%1d sockets accepted\n",sockets);
      }
   }
}
2

Which server are you talking about ? It might be it has a hardcoded max, or runs into other limits (max threads/out of address space etc.)

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 has some tuning to needed to achieve a lot of connection, but it doesn't help if the server application limits it in some way or another.

nos
  • 223,662
  • 58
  • 417
  • 506
  • I'm talking about a Core i7 16Go with 160 Go of SSD, with debian... Good article you posted by the way, not sure it will fix the issue, but good to know, i'll let you know how it goes... – TheSquad Aug 07 '10 at 13:26
  • Sorry did got what you ask at first... The server application is a software we made with no limitation – TheSquad Aug 07 '10 at 14:10
  • 1
    A custom server app doesn't get 32k simultaneous clients unless it's made by a noteworthy company or does something shady. In the first case, you wouldn't need help -- someone who didn't understand scaling issues wouldn't have gotten hired. – cHao Aug 07 '10 at 14:29
  • 1
    Then call bullshit, I don't have to justify myself to get an answer from you... This issue is known to be tricky, and I'm not even sure that anyone as a good answer to this question (How many socket a server can handle at max...). The fact is that we have a lot more than 32K connection going on, only each servers are limited to 32K. Right now with all clustered servers, we do have more than 1 millions connection. We are looking for solutions to lower down the number of servers. That's it ! – TheSquad Aug 07 '10 at 14:37
  • Umm, yeah, you *do* have to justify yourself to get an answer from me. SO doesn't pay me -- i'm here because i like solving problems. However, i'm not into helping people solve the wrong problem -- and so far, the problem seems more to be this supposed requirement for 32k+ simultaneous long-lived connections on one box, rather than a kernel and/or runtime limit that hardly anyone but stress testers even know exists. So unless i see that that's necessary, i'm going to continue to say "use fewer sockets". – cHao Aug 07 '10 at 16:05
  • @TheSquad As its software you've written yourself, are you really sure there's no limitations ? Are you using threads ? select(),poll() or epoll() ? What's the error you get when you reach 32720 sockets ? What language/API is it using ? – nos Aug 07 '10 at 16:35
  • @cHao it's not that uncommon. 32k isn't really a lot - we've had people(a small 5 man company) serving *a lot* more than that to a simple iPhone app they made. – nos Aug 07 '10 at 16:55
  • @nos: Constantly connected? I'm not saying it's unusual to serve 32k clients -- Google's or MS's web stats would make that number look positively puny -- but to have that many clients connected simultaneously, to one machine, is highly unusual in my experience. – cHao Aug 07 '10 at 17:01
  • @cHao It was using Comet, the clients stayed on about 3 minutes on average during peak hours. – nos Aug 07 '10 at 17:21
  • @nos : Yes it is using threads, pthreads... but it is not the issue. @cHao : I'm not sure you want to help more than understand what we do that involve so many clients, lol... – TheSquad Aug 07 '10 at 18:20
  • @cHao : If you have any way to use one socket for multiple clients at the same time without disconnecting them, then please advice, if not, the problem stay the same. I'm not looking for someone telling me that something is broken in the design, but for a solution... The fact that you don't believe me when I say we got a lot more than 32K clients on the server at the same time, then I can't say anything else. – TheSquad Aug 07 '10 at 20:42
  • @TheSquad I'm just asking to learn how you're getting to the limit - e.g. if you max out the address space,exhaust the process/thread id space(easy with threads) or what eventuall fails(like specific socket errors) – nos Aug 07 '10 at 20:58
  • I know nos, unfortunately, we can't test that right now, too many people on servers... I will have to test it when it is not peak hours... But normally max-thread, pid_max, fd_max and stack-size are correctly set up, we have made a stress test before and got a ±2^20 threads running on the server. – TheSquad Aug 07 '10 at 21:33
  • @nos, be sure I'll post everything you need to know as soon as we are able to test it – TheSquad Aug 07 '10 at 21:33
  • I've written a server that does nothing but accept connections til it can't anymore, and a client on the same machine that constantly connects til it can't anymore. With the sysctls from the answer applied, my only limitation was the local port range, which i'd widened to 50000 ports, but was only using from one IP (localhost). That's 50000 sockets each for server and client, meaning 100k sockets total, and it'd have been more if i cared to widen the port range more. But after the first try, things started flaking out around 27k, so i stopped. – cHao Aug 08 '10 at 00:53
2

Check the real limits of the running process with.

cat /proc/{pid}/limits

The max for nofiles is determined by the Kernel, the following as root would increase the max to 100,000 "files" i.e. 100k CC

echo 100000 > /proc/sys/fs/file-max

To make it permanent edit /etc/sysctl.conf

fs.file-max = 100000

You then need the server to ask for more open files, this is different per server. In nginx, for example, you set

worker_rlimit_nofile 100000;

Reboot nginx and check /proc/{pid}/limits

To test this you need 100,000 sockets in your client, you are limited in the testing to the number of ports in TCP per IP address.

To increase the local port range to maximum...

echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

This gives you ~64000 ports to test with.

If that is not enough, you need more IP addresses. When testing on localhost you can bind the source/client to an IP other than 127.0.0.1 / localhost.

For example you can bind your test clients to IPs randomly selected from 127.0.0.1 to 127.0.0.5

Using apache-bench you would set

-B 127.0.0.x

Nodejs sockets would use

localAddress

/etc/security/limits.conf configures PAM: its usually irrelevant for a server.

If the server is proxying requests using TCP, using upstream or mod_proxy for example, the server is limited by ip_local_port_range. This could easily be the 32,000 limit.

teknopaul
  • 6,505
  • 2
  • 30
  • 24
1

If you're considering an application where you believe you need to open thousands of sockets, you will definitely want to read about The C10k Problem. That page discusses many of the issues you will face as you scale up your number of client connections to a single server.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • The C10K problem is from 2003... With 32000 client connected the server still have great performance, it can handle much more believe me ! – TheSquad Aug 07 '10 at 13:02
  • 1
    Don't you think that a seven years old problem is still of actuality with today's Core I7 8Go of RAM, and 2 network of 1Go each ? Like I said in my first post, with 32720 clients connected, the cpu is still under 10% of use, and free memory is way enough to open more connection (4Go). and here's some ifstat rows eth0 KB/s in KB/s out 89.22 145.37 126.97 136.15 104.11 158.18 84.17 123.62 90.64 106.47 93.17 125.98 97.21 130.69 – TheSquad Aug 07 '10 at 13:13
  • @TheSquad, aha, and most TCP stacks were written 30 years ago. Gigs of RAM have nothing to do with this, it's the client port range. You obviously have no clue, so do yourself a favor and listen to what experienced people have to say. – Nikolai Fetissov Aug 09 '10 at 02:08
  • do yourself a favor and read this : http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 experienced one... – TheSquad Aug 09 '10 at 09:38
  • i have pointed out the RAM because each connection are SSL and SSL session take RAMs... – TheSquad Aug 09 '10 at 09:40
0

Generally having too much live connections is a bad thing. However, everything depends on the application and the patterns it communicates with its clients.

I suppose there is a pattern when clients have to be permanently async-connected and it is the only way a distributed solution might work.

Assumimg there are no bottlenecks in memory/cpu/network for the current load, and keeping in mind that to leave idle open connection is the only way distributed applications consumes less resources (say, connection time, and the overall/peak memory), overall OS network performance might be higher than using best practices we all know.

Good question and it needs for a solution. The problem is nobody can answer this. I would suggest to use divide & conquer technique and when the bottleneck is found return to us.

Please take apart your application on testbed and you will find the bottleneck.

Jonas G. Drange
  • 8,749
  • 2
  • 27
  • 38
  • 1
    Hi, This question is 2 years old, and of course, I have overtopped this limit multiple times... Actually, the real limit that will be hit by the linux core, is the number of file descriptor (opened file)... this is the only limit. the limit of sockets, is actually 32768 per IP. Right now I have a working server (using Erlang) with something close to 2 millions user connected. – TheSquad Aug 03 '12 at 18:34
0

On Gnu+Linux, maximum is what you wrote. This number is (probably) stated somewhere in networking standards. I doubt you really need so many sockets. You should optimize the way you are using sockets instead of creating dozens all the time.

skalee
  • 12,331
  • 6
  • 55
  • 57
  • No, socket is just a limited resource. Clients are using sockets. It is not true that socket = connected client or each client needs his own socket. It depends on protocol. For example, TCP needs such an association (1 socket - 1 client) but UDP does not. Even when using TCP, who said that connection must be continuous? – skalee Aug 07 '10 at 14:10
  • 1
    I meant, in our software a client = a socket... We use SSL, do UDP is out of question, and connection needs to be continuous... – TheSquad Aug 07 '10 at 14:24
0

In net/socket.c the fd is allocated in sock_alloc_fd(), which calls get_unused_fd().

Looking at linux/fs/file.c, the only limit to the number of fd's is sysctl_nr_open, which is limited to

int sysctl_nr_open_max = 1024 * 1024; /* raised later */

/// later...
sysctl_nr_open_max = min((size_t)INT_MAX, ~(size_t)0/sizeof(void *)) &
                         -BITS_PER_LONG;

and can be read using sysctl fs.nr_open which gives 1M by default here. So the fd's are probably not your problem.

edit you then probably checked this as well, but would you care to share the output of

#include <sys/time.h>
#include <sys/resource.h>
int main() {
    struct rlimit limit;
    getrlimit(RLIMIT_NOFILE,&limit);
    printf("cur: %d, max: %d\n",limit.rlim_cur,limit.rlim_max);
}

with us?

mvds
  • 45,755
  • 8
  • 102
  • 111
  • yeah, fd are fine, this was the first thing I have checked... I'm more concern about ports, but even here there should be ~32000 more ports available – TheSquad Aug 07 '10 at 13:42
  • Ports should be fine too. If you're running a server, it should be listening on one port, and all the clients would be connected to that same port number. Only a few protocols work differently -- with FTP being the only one i can come up with right off -- and that's because it uses a separate socket for data transfer. – cHao Aug 07 '10 at 14:02
  • your question was on sockets, and those don't seem to be the problem. "ports" cannot be a problem if you're the server and clients connect to you. Otherwise, it can be and you may have to increase net.ipv4.ip_local_port_range. Please be a little more specific on the situation; what exactly fails, giving what return value? – mvds Aug 07 '10 at 14:03
  • cHao : the in port is the same, but connection are 2 way side, the outgoing port is not the same for each client. – TheSquad Aug 07 '10 at 14:41
  • @mvds : the result is 798621 for both – TheSquad Aug 07 '10 at 15:09
  • 1
    If you need one port per client, increase the port range, and use multiple ip's. There's only 64k ports in 2 bytes ;-) – mvds Aug 07 '10 at 15:15
  • @mvds : I'm aware of that, the port range has been increased, but even there we don't reach the limit of 64K only 32K... However, there is the possibility to bound the service to other local address, so not really an issue actually... – TheSquad Aug 07 '10 at 15:24