9

I have five tomcat instances behind nginx.

Sometimes the nginx upstream_response_time is very big, more than 1 second, while the tomcat local access log shows the process time is only 50ms(I use %D to log process time).

What is the possible reason and how to fix it? It does not seems the network is slow since other applications run fast.

Update:

Seems the nginx upstream_response_time = %D + 1 sec.

xingbin
  • 27,410
  • 9
  • 53
  • 103
  • This could also depend on how many connections are open between nginx and tomcat. You can run `netstat -pant` and see if see lot of connections in such cases on the tomcat server – Tarun Lalwani Jul 14 '19 at 15:48
  • "It does not seem the network is slow" – based on the information you provided, it most probably is a networking and/or I/O and/or virtualization issue. Note that in nginx you'll see the time for the whole call (nginx -> network -> ... -> vm of tomcat -> jvm of tomcat -> tomcat processing -> jvm/vm of tomcat -> network ... -> nginx). – D. Kovács Jul 15 '19 at 10:43
  • I would say it is I/O issue. But I do not know how to monitor it. The network is pretty fast, any other applications in the network behave fast. – xingbin Jul 15 '19 at 11:46
  • Is the timing real? Were you able to reproduce the issue yourself? It could be a client side problem. – LMC Jul 15 '19 at 18:18
  • If it's a client side problem you should see the same Send-Q buffers more than 50ms `ss -4 -nt '( dst :443 )' | grep -vE 'ESTAB +0 +0' | sort -nr -k 3,3 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 695 192.168.0.4:57432 192.168.0.2:443`. – LMC Jul 15 '19 at 20:23
  • @LuisMuñoz It's hard to reproduce it mannualy. This happens when there are lots of requests. But I can get the time delay from log. – xingbin Jul 16 '19 at 01:15
  • how about to add more `*_time` variables from the upstream module https://nginx.org/en/docs/http/ngx_http_upstream_module.html#variables this could help to show in which phase nginx use the most time. You can find some examples in this blog post https://docs.nginx.com/nginx/admin-guide/monitoring/logging/ – Aleksandar Jul 17 '19 at 13:07
  • I would prefer using some sort of a test with something that is in place of nginx for start to ensure that tomcat is completely accurate. Use a `curl` command for example, replay a certain request to the tomcat and see how much time it takes. If that didn't help then I'd say to look at a lower level like the network packets with `tcpdump` or wireshark and see when the packets of the request and response to see their timings and compare that with the logs. For sure the packets won't show any network issues but their timing will pinpoint something. – ahasbini Jul 17 '19 at 20:01

2 Answers2

2

My hypothesis for your observation are packet losses. This seems the most probable problem to me since you are saying that it happens when they are many requests. To test this, you can monitor the traffic e.g. with tcpdump or iftop. If you are under Ubuntu, you can install and start the latter with

sudo apt-get install iftop
sudo iftop

There are many other network monitoring solutions in Linux, the amazing Wireshark works for all operating systems.

One reason for package losses can be collisions, you can check that with a ifconfig [interface] if you are under Linux:

me@mymahine:~$ ifconfig eth1
eth1      Link encap:Ethernet  HWaddrf f:41:8d:ef:41:8d  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 <-------------------------- check here ---
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Are the Tomcats and the Nginx on the same physical (or virtual) machine?

Further reading

B--rian
  • 5,578
  • 10
  • 38
  • 89
1

The servers normally queue the requests until a thread is available to handle it. If there are many requests in the queue but only a few threads, a single thread might handle the request quite fast, but if you add the time, the request was queued, the consumer sees a much longer time.

See: How to increase number of threads in tomcat thread pool?

Measuring the number of queued requests for tomcat

See if you can increase the number of threads or decrease the accept_count, but keep in mind, that the number of other resources like database connections might also need to be increased. Also keep in mind, that more threads might mean more competing for resources.

It might be worth to try change the parameters for this. Normally the access-log should also show the time the message is queue and handled, but I am not sure.

aschoerk
  • 3,333
  • 2
  • 15
  • 29
  • Does HTTP1.1 or HTTP1.0 make a difference? – xingbin Jul 21 '19 at 09:43
  • In my opinion, no. Except in the case of persistent connections in tuning your tomcat you must also keep the maximum number of connections it can serve in mind. If it is possible for you, I would work the parameters, to see, if my theory is correct. As I already wrote, I am not sure, where the tomcat accesslog measures, before or after the queue. – aschoerk Jul 21 '19 at 09:48