0

Any friendly suggestions on why we might be receiving these 500 status' from tomcat for file downloads that seemingly started out successfully (200 status)?

We are using a ExtendedAccessLogValve to log all requests to our tomcat server.

Here is the pattern:

#Fields: date time time-taken sc-status cs-method x-H(scheme) cs(Host) cs-uri-stem cs-uri-query sc(Content-Length) bytes c-ip x-R(access_metrics_username) x-R(access_metrics_openid) x-R(access_metrics_userid) x-R(access_metrics_sessionid) cs(Referer) cs(User-Agent)

#Version: 2.0

#Software: Apache Tomcat/7.0.76

Then here is an example 500 status response for a file download that started the download and failed after sending 1219416 bytes (1.1 megs) of data.

2019-09-24 06:00:14 200.109 500 GET https "www.{HOST_NAME}.org" /api/v1/archive/request/{UUID}/file/{UUID} api-token={API_TOKEN} "7600353549" 1219416 xxx.xx.xxx.181 "{USER_NAME}" "{USER_OPENID}" "{USER_UUID}" - "{REFERER}" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko"

Here is an explanation of all our fields:

  • date
  • time
  • time-taken, the amount of time the request took from start to finish
  • sc-status, the http status code
  • cs_method, the http request method
  • x-H(scheme), the http protocol used, always https for us
  • cs(Host), website host
  • cs-uri-stem, requested url
  • cs-uri-query, query parameters
  • sc(Content-Length), the number of expected bytes in the response
  • bytes, the number of bytes sent
  • c-ip, client ip address
  • x-R(access_metrics_username), the user's username
  • x-R(access_metrics_openid), the user's openid
  • x-R(access_metrics_userid), the user's database uuid
  • x-R(access_metrics_sessionid), the JSESSIONID value
  • cs(Referer)
  • cs(User-Agent)

So, the user successfully started a download that lasted for 200.109 seconds and successfully downloaded 1.1 megs of data and then something happened where this request then errored out and tomcat reported the response as a 500.

We have not been able to figure out what happened to this request or how to reproduce it.

We have tried the following:

  • Starting a download and then canceling it from the browser. The access logs report a 200 status code for that request.
  • Started a download and moved the file on disk from where tomcat was downloading it.

Any thoughts or suggestions on what is happening or how to reproduce this problem would be greatly appreciated.

Update From Comment Questions

Q: Are there any exceptions in your logs?

A: No. But the answer is more complicated than that...

We capture most exceptions by an ExcpetionFilter that has a whitelist in it. Meaning the following exceptions are not logged/mailed to the development team:

  • org.apache.catalina.connector.ClientAbortException
  • java.net.SocketException
  • xxx.xxx.service.security.xxxAccessDeniedException
  • org.apache.commons.fileupload.FileUploadBase$IOFileUploadException

We do this so we do not inundated ourselves with exception emails and potentially bring down our email servers.

I have added logging inside of the Whitelist filter to see if these exceptions are being thrown.

Edit

I forgot to mention in the first part of the email that we actually stream the files ourselves instead of letting tomcat do this work for us. It is kind of difficult to explain why we're doing that, but we are.

To stream the file we are using the FileCopyUtils utility class from Spring.

This call is surrounded by a try/catch that just turns around and rethrows the checked exceptions with a RuntimeException.

try {

    FileCopyUtils.copy(new FileInputStream(file), response.getOutputStream());

} catch (FileNotFoundException | IOException e) {

    throw new RuntimeException(e);
}

There are no other try/catch statements in the calling chain.

Community
  • 1
  • 1
hooknc
  • 4,854
  • 5
  • 31
  • 60
  • How frequently do you see these situations? – Christopher Schultz Oct 16 '19 at 19:50
  • Well, that is a good question. We have been receiving hundreds of thousands (maybe a million total) of them over the last couple of months. Mainly from ip addresses located in China. – hooknc Oct 16 '19 at 20:08
  • Any log messages in other log files like "client abort" or anything like that? – Christopher Schultz Oct 16 '19 at 21:08
  • Again, good question. Short answer: No. Longer answer: Well, maybe. I will update the question with more information on how we're dealing with exceptions and things like that. – hooknc Oct 16 '19 at 23:06
  • Do things improve if you add an explicit `close()` on the response's `OutputStream` after you perform the `copy()`? You may seriously want to look into how to get Tomcat to use `sendfile` to stream the file for you. It will improve performance and reduce the amount of code you have to babysit. – Christopher Schultz Oct 18 '19 at 18:56
  • @ChristopherSchultz I did not know about `sendfile` and I will look into it more! Thank you for pointing out that behavior in tomcat! – hooknc Oct 18 '19 at 21:20
  • @ChristopherSchultz that `sendfile` behavior in tomcat is amazing. It messes with our download metrics, but it is amazing. Do you know if there is a valve that can be used by NIO that will accurately track the download time and bytes sent? Our current `ExtendedAccessLogValve` just reports back that the entire file was downloaded successfully, which is misinformation. – hooknc Oct 21 '19 at 16:47
  • What part about is it misinformation? Does the client fail? Is the connection dropped? It's possible that Tomcat streamed the whole file out to the connection but the client still failed. Tomcat can't detect the difference between "client went away normally" and "client went away .. not-normally" so what should be reported in the log when all bytes have been written with no errors? – Christopher Schultz Oct 22 '19 at 16:37

0 Answers0