2

I have a Spring MVC application that is deployed in Apache Tomcat 9 Webserver. Is it possible to calculate checksum as the file is being uploaded by the user? Where should I even begin looking if I wanted to do this?

To be specific, I understand that I can calculate checksum inside a @Controller class in Spring MVC. However, at this point in time, the file is completely uploaded to the temporary file upload directory. I am, specifically, asking if there is a way to calculate the checksum of the individual parts as a MultipartFile object is being uploaded/created. Do I have to override Apache Tomcat behaviour? Is it possible to do this without modifying Tomcat source code? If not, what is the potential impact of such a modification of Tomcat source code?

I do not understand the intricacies of webservers and would appreciate it if someone could tell me where to start looking.

M. Deinum
  • 115,695
  • 22
  • 220
  • 224
  • You still need the whole payload to calculate the checksum, so what would the gain be? Also there are no multiple parts in a `MultipartFile`, the `Multipart` is about the request which consists of multiple parts (it can contain multiple files or a file and a form). – M. Deinum Jun 25 '21 at 07:14
  • What is your ultimate goal? Why can't you compute the checksum after the files have been stored on disk (or memory)? – Piotr P. Karwasz Jun 25 '21 at 09:38
  • I assumed(perhaps mistakenly) that Tomcat has to manage the slices of the file as it is being uploaded and communicate information back to the client that displays any "Upload Progress bars" related to the upload. This would imply that the file is uploaded in multiple parts. Is that not so? – saquib-khan Jun 25 '21 at 16:11
  • The ultimate goal/gain is to save time during upload for large files. Currently, we calculate checksum in a separate thread, but that causes the checksum to not be available if the file is being saved on the object store. Ideally, we want the checksum to be calculated as the file is being uploaded to Tomcat and then use that checksum as metadata when it is being moved to object store by the Spring application. Does this answer your questions? – saquib-khan Jun 25 '21 at 16:20

1 Answers1

1

Yes, you can't calculate the checksum before you have all the file content. Do you have access to the front end part? If you have, you can calculate the checksum via Javascript and put that information in your request header.

But in this case, you may not be able to use MultipartFile component in Spring, you might need to use apache upload library directly, you have more low level control to allow you to read header before you start to get the file content streaming.

Sam YC
  • 10,725
  • 19
  • 102
  • 158
  • I do have access to the front end part. However, we have to allow upload of large files (upto 200GB based on recent history). How would this impact your idea of using javascript to calculate checksum? Could this, potentially, crash (older) browsers on slower computers? *Is this the Apache Upload library you mentioned: https://commons.apache.org/proper/commons-fileupload/ ? – saquib-khan Jun 25 '21 at 16:44
  • Yes, [`commons-fileupload`](https://mvnrepository.com/artifact/commons-fileupload/commons-fileupload/1.4) is used by many servlet containers to parse `multipart/form` requests (Tomcat also has a [repackaged copy](https://tomcat.apache.org/tomcat-9.0-doc/api/org/apache/tomcat/util/http/fileupload/package-summary.html) of it). – Piotr P. Karwasz Jun 25 '21 at 19:19
  • @saquib-khan you can check this https://stackoverflow.com/questions/768268/how-to-calculate-md5-hash-of-a-file-using-javascript, you can do simple benchmark from those suggested library. however, it will be definitely better compare to if you upload the whole 200GB file to the server directly just for the checksum. – Sam YC Jun 26 '21 at 06:46
  • @saquib-khan yes, apache commons-fileupload is the one, I guess internally Spring `MultipartFile` use that as well, but to use it with more flexibly and native feature, you need to use it directly instead of on top of Spring `MultipartFile`. – Sam YC Jun 26 '21 at 06:50
  • I am developing a prototype Spring Boot application where I disable the MultipartResolver for Spring and try to go after the uploading file's stream directly. I stepped through Spring code and was able to confirm that Spring does indeed use Apache Commons FileUpload library to manage file upload. So, there is some good news. I will update here once I have a functioning prototype. – saquib-khan Jun 28 '21 at 16:17
  • However, I am deviating from @SamYC recommendation in one important aspect. I was able to find explanations for how MD5 checksum works and it seems that MD5 is incremental(you can calculate MD5 of a file in slices of 512 bits). So, I intend to calculate MD5 on the server end of the application. Hopefully, I can read the uploading file's stream 512 bits at a time, calculate checksum, write those bits to temporary file upload directory.......and everything just "works". – saquib-khan Jun 28 '21 at 16:21