Any friendly suggestions on why we might be receiving these 500 status' from tomcat for file downloads that seemingly started out successfully (200 status)?
We are using a ExtendedAccessLogValve to log all requests to our tomcat server.
Here is the pattern:
#Fields: date time time-taken sc-status cs-method x-H(scheme) cs(Host) cs-uri-stem cs-uri-query sc(Content-Length) bytes c-ip x-R(access_metrics_username) x-R(access_metrics_openid) x-R(access_metrics_userid) x-R(access_metrics_sessionid) cs(Referer) cs(User-Agent)
#Version: 2.0
#Software: Apache Tomcat/7.0.76
Then here is an example 500 status response for a file download that started the download and failed after sending 1219416 bytes (1.1 megs) of data.
2019-09-24 06:00:14 200.109 500 GET https "www.{HOST_NAME}.org" /api/v1/archive/request/{UUID}/file/{UUID} api-token={API_TOKEN} "7600353549" 1219416 xxx.xx.xxx.181 "{USER_NAME}" "{USER_OPENID}" "{USER_UUID}" - "{REFERER}" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko"
Here is an explanation of all our fields:
- date
- time
- time-taken, the amount of time the request took from start to finish
- sc-status, the http status code
- cs_method, the http request method
- x-H(scheme), the http protocol used, always https for us
- cs(Host), website host
- cs-uri-stem, requested url
- cs-uri-query, query parameters
- sc(Content-Length), the number of expected bytes in the response
- bytes, the number of bytes sent
- c-ip, client ip address
- x-R(access_metrics_username), the user's username
- x-R(access_metrics_openid), the user's openid
- x-R(access_metrics_userid), the user's database uuid
- x-R(access_metrics_sessionid), the JSESSIONID value
- cs(Referer)
- cs(User-Agent)
So, the user successfully started a download that lasted for 200.109 seconds and successfully downloaded 1.1 megs of data and then something happened where this request then errored out and tomcat reported the response as a 500.
We have not been able to figure out what happened to this request or how to reproduce it.
We have tried the following:
- Starting a download and then canceling it from the browser. The access logs report a 200 status code for that request.
- Started a download and moved the file on disk from where tomcat was downloading it.
Any thoughts or suggestions on what is happening or how to reproduce this problem would be greatly appreciated.
Update From Comment Questions
Q: Are there any exceptions in your logs?
A: No. But the answer is more complicated than that...
We capture most exceptions by an ExcpetionFilter that has a whitelist in it. Meaning the following exceptions are not logged/mailed to the development team:
- org.apache.catalina.connector.ClientAbortException
- java.net.SocketException
- xxx.xxx.service.security.xxxAccessDeniedException
- org.apache.commons.fileupload.FileUploadBase$IOFileUploadException
We do this so we do not inundated ourselves with exception emails and potentially bring down our email servers.
I have added logging inside of the Whitelist filter to see if these exceptions are being thrown.
Edit
I forgot to mention in the first part of the email that we actually stream the files ourselves instead of letting tomcat do this work for us. It is kind of difficult to explain why we're doing that, but we are.
To stream the file we are using the FileCopyUtils utility class from Spring.
This call is surrounded by a try/catch that just turns around and rethrows the checked exceptions with a RuntimeException.
try {
FileCopyUtils.copy(new FileInputStream(file), response.getOutputStream());
} catch (FileNotFoundException | IOException e) {
throw new RuntimeException(e);
}
There are no other try/catch statements in the calling chain.