7

I have a problem: I want to create an Erlang server that can hold 1M simultaneous open tcp connection. I tuned my OS (Oracle Linux 7) to raise the file descriptors. On the server I do gen_tcp:listen

// point_1
Socket = gen_tcp:accept
spawn(handle(Socket)) // another thread
back to point_1

If I connect sequentially its no problem, in 100 sec I connected 100K clients, but I had no patience for more.

If I want to connect those in a concurrent way, only around 80 connections are made from 100, for example.

This is how I run everything:

erlc *.erl
erl +Q 134217727 +P 1000000 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000

// start one server that will listen on port 9999

ex:start(1, 9999) 

// 100 clients try to connect on port 9999

ex:connect_clients(100, 9999)

Let me show you some code:

start(Num,LPort) ->
  case gen_tcp:listen(LPort,[{active, false},{packet,2}]) of
    {ok, ListenSock} ->
      start_servers(Num,ListenSock),
      {ok, Port} = inet:port(ListenSock),
      Port;
    {error,Reason} ->
      {error,Reason}
  end.

start_servers(0,_) ->
  ok;
start_servers(Num,LS) ->
  spawn(?MODULE,server,[LS,0]),
  start_servers(Num-1,LS).

server(LS, Nr) ->
  io:format("before accept ~w~n",[Nr]),
  case gen_tcp:accept(LS) of
    {ok,S} ->
      io:format("after accept ~w~n",[Nr]),
      spawn(ex,loop,[S]),
      server(LS, Nr+1);
    Other ->
      io:format("accept returned ~w - goodbye!~n",[Other]),
      ok
  end.

loop(S) ->
  inet:setopts(S,[{active,once}]),
  receive
    {tcp,S, _Data} ->
      Answer = 1, 
      gen_tcp:send(S,Answer),
      loop(S);
    {tcp_closed,S} ->
      io:format("Socket ~w closed [~w]~n",[S,self()]),
      ok
  end.

client(PortNo) ->
  {ok,Sock} = gen_tcp:connect("localhost", PortNo,
    []).

connect_clients(Number, Port) ->
  spawn(ex, client, [Port]),
  case Number of
    0 -> ok;
    _ -> connect_clients(Number-1, Port)
  end.
Chenmunka
  • 685
  • 4
  • 21
  • 25
Ștefan Stan
  • 223
  • 1
  • 7

1 Answers1

7

I see at least two issues here:

  • You need to raise your listen backlog; it defaults to 5. You can raise it by setting {backlog, N} in your listen options, e.g., {backlog, 1024}.

  • Your server/2 function is faulty because it accepts a connection, then spawns a new process to run loop/1 but it doesn't make that new process the controlling process for the accepted socket. The loop/1 function attempts to set {active,once} mode on the socket in an attempt to receive incoming messages, but since it's not running in the controlling process, it's not going to work. (You should verify the return value of inet_setopts/2 by saying ok = inet:setopts(S,[{active,once}]), there instead.)

Instead of spawning the loop, you should instead spawn a new acceptor, like this:

server(LS, Nr) ->
  io:format("before accept ~w~n",[Nr]),
  case gen_tcp:accept(LS) of
    {ok,S} ->
      io:format("after accept ~w~n",[Nr]),
      spawn(ex,server,[LS,Nr+1]),
      loop(S);
    Other ->
      io:format("accept returned ~w - goodbye!~n",[Other]),
      ok
  end.

With this approach, the process that accepted the socket runs loop/1 and so there's no need to change the socket's controlling process.

Steve Vinoski
  • 19,847
  • 3
  • 31
  • 46
  • Indeed, you were right about those two issues. I fixed those. Now when I call connect_clients(1000, 9999) it connects around 100 per second and it goes to 800; then it stops. The server doesn't crash so i can call it again, connecting one client or what number I want. But i can't connect more than around 800 per call. I wish I could call connect_clients(1000000, 9999), but then my VM freezes. Any thoughts? Thanks for what you have already helped me with. – Ștefan Stan Sep 22 '15 at 10:33
  • 2
    Also, remember, you can have several acceptor processes calling `gen_tcp:accept/1` on the same listen socket for better throughput when accepting. – Adam Lindberg Sep 22 '15 at 17:02
  • @ȘtefanStan hard to say what new problems you're hitting. Are you running with `sasl` enabled? You might start it up to make sure you're not hitting any problems that may be going unnoticed. Are you sure your OS is set up properly to allow the number of connections you need? What does `ulimit -n` in your shell indicate? Also, what version of Erlang/OTP are you using? – Steve Vinoski Sep 22 '15 at 21:50