10

I have a node.js client (10.177.62.7) requesting some data from http rest service from server (10.177.0.1). Client is simply using node.js http.request() method (agent=false). Client is on Ubuntu 11.10 box.

Why client sends FIN ACK after 475ms? Why so slow? He should send FIN ACK immediately. I have many situations like this. About 1% of whole traffic is request with delayed FIN ACK.

Cpu idle on the client is about 99%, so nothing is draining CPU.

How to debug this? What could it be? Is there any sysctl option I need to tune?

On screenshot 2nd column is the elapsed time between packets.

Link to bigger picture.

enter image description here

codeforester
  • 39,467
  • 16
  • 112
  • 140
Tereska
  • 751
  • 1
  • 7
  • 25
  • I deleted my answer about HTTP keep-alive since it was definitively ruled out. Can't think of any other answers though. The FIN should go out as soon as the socket is closed. – Alan Curry Jul 29 '12 at 21:23
  • @AlanCurry But the *FIN/ACK* would only go out when the client has read the incoming FIN and decided to close the socket, which could take any amount of time. This is a behaviour of node.js, not the TCP/IP stack. – user207421 Jul 30 '12 at 00:11
  • Sure, but if it's in the middle of a call to an http client library, it's not doing keep-alive, and the CPU load is 1%, what's taking it so long to close the socket after reading EOF? – Alan Curry Jul 30 '12 at 00:15
  • How many simultaneous connections? How many total connections are you generating? How large is the requested object? What is the network latency between client and server? How large is your POST object? How many of those are happening at the same time? Do you implement any kind of back-off in your client when there are errors connecting? – jxh Aug 06 '12 at 21:42
  • There are about 1000 sim connections. Request object is very small (about 150 bytes) latency 2-3ms. Post is about 1-20kb. There is no back-off logic. – Tereska Aug 07 '12 at 16:20
  • 1
    @Tereska: You might have burned through all your ephemeral ports, and are waiting for the 2xMSL timeout to finish for some of them before new connections can be created. Did you check netstat? – jxh Aug 11 '12 at 00:40
  • My sysctl is tuned for this. I have port range starting at 1024, time_wait 1s. netstat is showing about 1k connections established and couple in time_wait. – Tereska Aug 11 '12 at 10:39

1 Answers1

4

This behaviour is the Delayed ACK feature of RFC1122 TCP stack.

Normally you should add the TCP_QUICKACK option to your Linux TCP socket to disable delayed ACK but I think it is not obvious with JavaScript Node.js API (I only saw socket.setNoDelay for TCP_NODELAY option).

So your idea to apply a system-wide change on TCP stack seems good but I found no sysctl matching this socket option behaviour. Here is another full list with explanation.

Community
  • 1
  • 1
Yves Martin
  • 10,217
  • 2
  • 38
  • 77
  • This looks like the right answer unless there are more details the OP has not provided – Mike Pennington Aug 02 '12 at 15:47
  • Thanks for your answer! I have some additional questions. Is this valid for FIN ACK or only for "alone" ACK. Can you tell why kernel delayed that FIN/ACK. In tcpflow data there is exacly one ACK per packet comming from server. Why kernel delayed client FIN/ACK. It could avoid sending ACK every 'data' packet. But it choose to delay FIN/ACK? Is this possible? – Tereska Aug 02 '12 at 18:23
  • Please read 4.2.3.2 section of RFC1122... it is not specific to FIN packets. I agree with you there is no reason to delay a FIN+ACK packet without data. I invite you to contact Linux kernel TCP team on LKML or open source code for delayed ACK logic implementation. Probably an optimization can be done and at least a new sysctl switch to avoid that behaviour. – Yves Martin Aug 02 '12 at 18:41
  • LKML (netdev) says that this is application issue. http://marc.info/?l=linux-netdev&m=134409837229053&w=2 – Tereska Aug 04 '12 at 18:03
  • You tell that only 1% of traffic is impacted by the delay. So it does not come from application code processing itself but may come from the JavaScript engine itself like garbage collection latency to discard the socket after use... You should try to run your code with alternate JavaScript engines and different browsers. Have you found strange lines in dmesg ? strace-ing a browser is possible but it will be really difficult to analyze. Maybe strace-ing the code running in a standalone JavaScript engine is an option to get a diagnostic – Yves Martin Aug 06 '12 at 20:47