10

I’m investigating a problem where Tomcat (7.0.90 7.0.92) returns a response with no HTTP headers very occasionally.

According to the captured packets by Wireshark, after Tomcat receives a request it just returns only a response body. It returns neither a status line nor HTTP response headers.

It makes a downstream Nginx instance produce the error “upstream sent no valid HTTP/1.0 header while reading response header from upstream”, return 502 error to the client and close the corresponding http connection between Nginx and Tomcat.

What can be a cause of this behavior? Is there any possibility which makes Tomcat behave this way? Or there can be something which strips HTTP headers under some condition? Or Wireshark failed to capture the frames which contain the HTTP headers? Any advice to narrow down where the problem is is also greatly appreciated.

This is a screenshot of Wireshark's "Follow HTTP Stream" which is showing the problematic response:

enter image description here

EDIT:

This is a screen shot of "TCP Stream" of the relevant part (only response). It seems that the chunks in the second response from the last looks fine:

enter image description here

EDIT2:

I forwarded this question to the Tomcat users mailing list and got some suggestions for further investigation from the developers:

http://tomcat.10.x6.nabble.com/Tomcat-occasionally-returns-a-response-without-HTTP-headers-td5080623.html

But I haven’t found any proper solution yet. I’m still looking for insights to tackle this problem..

Kohei Nozaki
  • 1,154
  • 1
  • 13
  • 36
  • can you share a capture (from Wireshark for example : Follow HTTP Stream) – Eugène Adell Nov 27 '18 at 11:00
  • @EugèneAdell Added a screenshot from "Follow HTTP Stream". thanks – Kohei Nozaki Nov 28 '18 at 02:37
  • Can you post your nginx configuration? Specifically, if you're using `keepalive` and a non-default value of `proxy_http_version` within nginx? – cnst Nov 30 '18 at 04:34
  • @cnst I'm using `proxy_http_version 1.1` and `keepalive 100` – Kohei Nozaki Nov 30 '18 at 04:40
  • You would see the result in your last screenshot if you have an application bug where , for whatever reason it writes back 2 responses for 1 incoming requests., be it that you've registered 2 handlers for the same endpoint or simply have a weird bug/race condition, I'd try to re-produce this without nginx in the mix though , as to try to eliminate at least on source of error. – nos Dec 06 '18 at 16:25
  • Either the status-line & headers are never produced, or they are sent and swallowed by something. To check the latter case, I would look at the Tomcat access log to check **all** of the requests, with the correct bytes sent value, are there. If yes, something is swallowing and it must necessarily be a library of your application. It may look like an obvious check, but it is necessary. What is swallowing ? We'll investigate further, although I would be on the Listener. – Eugène Adell Dec 06 '18 at 20:26
  • At my knowledge, only one Listener is able to swallow Tomcat's output, but I can be missing something. [Figure 3-1](https://docs.oracle.com/cd/B14099_19/web.1012/b14017/filters.htm#i1000029) is interesting to see, the Listener has enough power to override the container's output. Anyway, it's an hardcore debugging but exciting. – Eugène Adell Dec 06 '18 at 20:53
  • As I mentioned in https://stackoverflow.com/questions/53496598/tomcat-occasionally-returns-a-response-without-http-headers/53552752#comment94144381_53552752 , I haven’t fully verified yet but I’m almost 100% sure that the root cause of this issue is a bug in the JAX-WS RI library v2.1.3 because after updating the library to v2.1.7 I haven’t seen the issue anymore. I’ll post a self answer after further testing. Thanks everyone – Kohei Nozaki Dec 07 '18 at 00:26
  • @KoheiNozaki thanks a lot for the award. – Eugène Adell Dec 07 '18 at 07:49

3 Answers3

4

The issues you experience stem from pipelining multiple requests over a single connection with the upstream, as explained by yesterday's answer here by Eugène Adell.

Whether this is a bug in nginx, tomcat, your application, or the interaction of any combination of the above, would probably be a discussion for another forum, but for now, let's consider what would be the best solution:

Can you post your nginx configuration? Specifically, if you're using keepalive and a non-default value of proxy_http_version within nginx? – cnst 1 hour ago

@cnst I'm using proxy_http_version 1.1 and keepalive 100 – Kohei Nozaki 1 hour ago

As per an earlier answer to an unrelated question here on SO, yet sharing the configuration parameters as above, you might want to reconsider the reasons behind your use of the keepalive functionality between the front-end load-balancer (e.g., nginx) and the backend application server (e.g., tomcat).

As per a keepalive explanation on ServerFault in the context of nginx, the keepalive functionality in the upstream context of nginx wasn't even supported until very-very recently in the nginx development years. Why? It's because there are very few valid scenarios for using keepalive when it's basically faster to establish a new connection than to wait for an existing one to become available:

  • When the latency between the client and the server is on the order of 50ms+, keepalive makes it possible to reuse the TCP and SSL credentials, resulting in a very significant speedup, because no extra roundtrips are required to get the connection ready for servicing the HTTP requests.

    This is why you should never disable keepalive between the client and nginx (controlled through http://nginx.org/r/keepalive_timeout in http, server and location contexts).

  • But when the latency between the front-end proxy server and the backend application server is on the order of 1ms (0.001s), using keepalive is a recipe for chasing Heisenbugs without reaping any benefits, as the extra 1ms latency to establish a connection might as well be less than the 100ms latency of waiting for an existing connection to become available. (This is a gross oversimplification of connection handling, but it just shows you how extremely insignificant any possible benefits of the keepalive between the front-end load-balancer and the application server would be, provided both of them live in the same region.)

    This is why using http://nginx.org/r/keepalive in the upstream context is rarely a good idea, unless you really do need it, and have specifically verified that it produces the results you desire, given the points as above.

    (And, just to make it clear, these points are irrespective of what actual software you're using, so, even if you weren't experiencing the problems you experience with your combination of nginx and tomcat, I'd still recommend you not use keepalive between the load-balancer and the application server even if you decide to switch away from either or both of nginx and tomcat.)


My suggestion?

  • The problem wouldn't be reproducible with the default values of http://nginx.org/r/proxy_http_version and http://nginx.org/r/keepalive.

  • If your backend is within 5ms of front-end, you most certainly aren't even getting any benefits from modifying these directives in the first place, so, unless chasing Heisenbugs is your path, you might as well keep these specific settings at their most sensible defaults.

Community
  • 1
  • 1
cnst
  • 25,870
  • 6
  • 90
  • 122
  • 1
    I agree with you but that Nginx config is something which I can't control actually.. I'm thinking about disabling HTTP keep-alive on the Tomcat side using `maxKeepAliveRequests="1"` .. – Kohei Nozaki Nov 30 '18 at 08:39
  • 1
    @KoheiNozaki, yeah, it'll probably have the same effect, as Tomcat would just close the connection, and would ignore the keepalive settings from the upstream. – cnst Nov 30 '18 at 09:01
  • According to this interesting answer, I still don't understand why Tomcat wouldn't add the status-line and the headers. Between NGINX and Tomcat, is there one and only one connection serving all of the requests from a kept-alive connection, or several ? – Eugène Adell Nov 30 '18 at 09:09
  • 2
    @EugèneAdell my guess would be that perhaps there's something in the request that possibly downgrades the connection to HTTP/0.9? – cnst Nov 30 '18 at 10:16
  • @EugèneAdell There are a lot of pooled kept-alive connections between Nginx and Tomcat. Not only one – Kohei Nozaki Nov 30 '18 at 10:42
  • 1
    Although disabling keepalive is a good solution for most cases, I’m still looking for the root cause and a proper solution for it because the number of requests is ridiculously huge in my use case and I’ve experienced a few bad effects from disabling keepalive.. – Kohei Nozaki Nov 30 '18 at 10:55
  • @cnst How could Keep-Alive work with HTTP/0.9 ? The Connection: Keep-Alive header comes with HTTP/1.0, and it's by default not required with HTTP/1.1. For 0.9, I don't think this would work – Eugène Adell Nov 30 '18 at 11:49
  • 1
    @KoheiNozaki Maybe try another [Connector](http://tomcat.apache.org/tomcat-7.0-doc/config/http.html). I suppose you are using the non-blocking, and maybe the blocking will change the behaviour. – Eugène Adell Nov 30 '18 at 11:50
  • 1
    @KoheiNozaki I often use the [Extended Log Valve](http://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Extended_Access_Log_Valve) for my troubleshooting. In your particular case, I would check the bytes received/sent, and the threadname, to see if all is logged correctly. – Eugène Adell Nov 30 '18 at 11:53
  • @EugèneAdell I’m using the APR/Native connector but I’ll give the others try. – Kohei Nozaki Nov 30 '18 at 12:25
  • @KoheiNozaki what bad effects did you experience by disabling keepalive? pretty sure there shouldn't be any if your servers are within 5ms of each other – cnst Nov 30 '18 at 21:46
  • @cnst The bad effects were some new errors on Nginx which I haven't seen when keep-alive is on. If my memory serves me right (sorry I'm out of office and I'd better double check to be 100% sure), one of them was something like `recv() failed (104: Connection reset by peer)` . And our infra team which is responsible for Nginx is not so happy about turning keep-alive off. Their concern seems to be the increase of the number of HTTP sessions. – Kohei Nozaki Nov 30 '18 at 21:58
  • @KoheiNozaki their concern doesn't make any sense. How could there be an increase in the number of sessions? The error might be expected when keepalive isn't available; they should probably not be turning it on within nginx in the first place; why exactly can it not be turned back to the defaults directly at nginx? – cnst Dec 01 '18 at 00:16
  • @cnst Do you mean that if keepalive had been disabled in the Nginx config (not in the upstream Tomcat config), the ‘recv()’ error wouldn’t have happened? As for their concern, it’s my guess.. I’d better double check what their real concern is with them.. – Kohei Nozaki Dec 01 '18 at 00:30
  • @KoheiNozaki yes, most likely, because the error simply indicates that the connection was closed whereas keepalive was expected, e.g., connection was not expected to be closed… – cnst Dec 01 '18 at 00:40
  • 1
    @cnst but if not I’m mistaken, the maxKeepAliveRequests=“1” parameter in the Tomcat config makes Tomcat always respond with the http response header “Connection: close”. From my understanding, it should have made Nginx know that the connection gets closed afterwards and Nginx wouldn’t try to keep using the connection. – Kohei Nozaki Dec 01 '18 at 00:50
  • @KoheiNozaki perhaps there wouldn't be a `Connection: close` if you'd still be getting replies without the headers? ;) – cnst Dec 02 '18 at 00:41
  • @cnst I've double checked that Tomcat adds the `Connection: close` header to responses when `maxKeepAliveRequests=“1”` is set. – Kohei Nozaki Dec 03 '18 at 03:22
  • @KoheiNozaki, yes, of course, I'd imagine that whatever causes the headers to be absent, might still be causing it as such even without keepalive, occasionally; however, I still think you would be better off to disable the keepalive, because, as explained, they're unlikely to be helpful. – cnst Dec 03 '18 at 03:27
  • @KoheiNozaki finally did you try other connectors ? – Eugène Adell Dec 03 '18 at 10:53
  • 1
    I've been testing with turning keep-alive off on the Tomcat side but I'm still getting the `upstream sent no valid HTTP/1.0 header while reading response header from upstream` error. Turning keep-alive off seems to be not a workaround.. I've started wondering if there are some compatibility issues between the `jaxws-rt-2.1.3` library and Tomcat7.. – Kohei Nozaki Dec 04 '18 at 07:08
  • @KoheiNozaki I asked before, but did you check if you have several Listeners in your web.xml ? If so, try ordering them with absolute-ordering. It's not a waste of time to try what solved other issues. – Eugène Adell Dec 04 '18 at 10:54
  • 1
    @EugèneAdell Unfortunately there is only one listener defined: `com.sun.xml.ws.transport.http.servlet.WSServletContextListener`. By the way, I tried upgrading the `jaxws-rt` library to v2.1.7 and I see some differences. Details: http://metro.1045641.n5.nabble.com/JAX-WS-RI-2-1-5-returning-malformed-response-tp1063518.html – Kohei Nozaki Dec 05 '18 at 23:49
  • @KoheiNozaki it sounds like when keepalives are used, you end up with 502 errors being given to the client, due to the missing headers. What happens after you disable keepalives, in whichever way? It would seem that without the keepalive, the missing headers simply result in the errors being logged by nginx (because of the missing `Connection: close` due to the occasionally missing headers?), but the clients no longer get 502? That would seem like a good-enough intermediate solution as far as the proxying goes, until you can fix the underlying issue with Tomcat and/or the app fixed? – cnst Dec 06 '18 at 06:12
  • @cnst Clients are still getting 50x errors regardless whether keep-alive is on for the same reason (i.e. Nginx still gets malformed responses).. I understand your point about latency, but I must say turning keep-alive is off won't be a solution for us. Thank you for your continued support anyway.. – Kohei Nozaki Dec 06 '18 at 06:55
3

We see that you are reusing an established connection to send the POST request and that, as you said, the response comes without the status-line and the headers.

after Tomcat receives a request it just returns only a response body.

Not exactly. It starts with 5d which is probably a chunk-size and this means that the latest "full" response (with status-line and headers) got from this connection contained a "Transfer-Encoding: chunked" header. For any reason, your server still believes the previous response isn't finished by the time it starts sending this new response to your last request.

A missing chunked seems confirmed as the screenshot doesn't show a last-chunk (value = 0) ending the previous request. Note that the last response ends with a last-chunk (the last byte shown is 0).

What causes this ? The previous response isn't technically considered as fully answered. It can be a bug on Tomcat, your webservice library, your own code. Maybe even, you're sending your request too early, before the previous one was completely answered.

Are some bytes missing if you compare the chunk-sizes from what is actually sent to the client ? Are all buffers flushed ? Beware of the line endings (CRLF vs LF only) too.

One last cause that I'm thinking about, if your response contains some kind of user input taken from the request, you can be facing HTTP Splitting.

Possible solutions.

It is worth trying to disable the chunked encoding at your library level, for example with Axis2 check the HTTP Transport.

When reusing a connection, check your client code to make sure that you aren't sending a request before you read all of the previous response (to avoid overlapping).

Further reading

RFC 2616 3.6.1 Chunked Transfer Coding

Eugène Adell
  • 3,089
  • 2
  • 18
  • 34
  • As for the missing chunks, as far as I see in the "TCP Stream" screen, everything seems to be fine (i.e. nothing is missing) in the second response from the last. A screenshot is attached to the question. – Kohei Nozaki Nov 29 '18 at 08:53
  • As for HTTP Splitting, those packets were captured during a load test with JMeter and everything is prepared by my colleagues. So, I believe it’s not the case. – Kohei Nozaki Nov 29 '18 at 09:09
  • Nice. With this second capture I agree all seems clean but status-line and headers are absent from the response. Where's the bug ? Can you disable the chunked encoding at the library level ? – Eugène Adell Nov 29 '18 at 09:54
  • I googled around but unfortunately I haven’t found any information about how to do that with the Metro library. – Kohei Nozaki Nov 29 '18 at 09:58
  • Maybe can you add a Filter on Tomcat, first for debugging this issue, and maybe later for solving. Filters can see when a request is accepted, and when a response is sent. It can be interesting to see if it sees any response at all. – Eugène Adell Nov 29 '18 at 10:43
  • On the [Tomcat user list](https://tomcat.markmail.org/search/?q=metro+list%3Aorg.apache.tomcat.users#query:metro%20list%3Aorg.apache.tomcat.users+page:2+mid:yq6sklpxwjexq3wm+state:results), they were talking about the Listener ordering ( ). Does it help ? – Eugène Adell Nov 29 '18 at 10:49
  • As for the mailing list, I’m not sure.. we don’t use Spring and I don’t see any connection to my problem in that thread. – Kohei Nozaki Nov 29 '18 at 11:07
  • 1
    The connection with your problem, is the precedence of the different frameworks/libraries that you are using (maybe, I can't be sure, I don't know your application). It looks like Tomcat is not doing its job, and I suggested to add a small filter to log when requests and responses are triggered, to check what really happens – Eugène Adell Nov 29 '18 at 13:17
  • As for Filter, what kind of Filter do you mean? Something like this? https://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Request_Dumper_Filter – Kohei Nozaki Nov 29 '18 at 13:40
  • No sorry, I mean a [Filter](https://docs.oracle.com/javaee/7/api/javax/servlet/Filter.html) that you implement yourself, not a Tomcat built-in one. What I have in mind, is to use such filter to log each time you receive a request, each time you send a response, in order to check whether Tomcat is able to see or not the response. – Eugène Adell Nov 29 '18 at 14:16
  • It sounds like something Tomcat’s builtin RequestDumperFilter does.. what’s the point of implementing something similar myself? Thank you for your continued support btw.. – Kohei Nozaki Nov 29 '18 at 23:28
  • I don't believe this built-in filter is similar, in the sense of what J2EE specs call a filter. My idea is not exactly to dump the traffic, but to see if Tomcat is working at all for the 2nd and later requests. I'm not sure even this is the right method, we are discovering things here. – Eugène Adell Nov 30 '18 at 06:55
  • I see, but `RequestDumperFilter` implements the `javax.servlet.Filter` interface and to me it's something which is JEE spec compliant.. – Kohei Nozaki Nov 30 '18 at 08:44
  • Ah ok, then it is worth trying it. – Eugène Adell Nov 30 '18 at 08:57
0

It turned out that the "sjsxp" library which JAX-WS RI v2.1.3 uses makes Tomcat behave this way. I tried a different version of JAX-WS RI (v2.1.7) which doesn't use the "sjsxp" library anymore and it solved the issue.

A very similar issue posted on Metro mailing list: http://metro.1045641.n5.nabble.com/JAX-WS-RI-2-1-5-returning-malformed-response-tp1063518.html

Kohei Nozaki
  • 1,154
  • 1
  • 13
  • 36