2

The method below in my Java Spring application directly streams and uploads a file to an Amazon S3 bucket. I have researched that using streams will make the uploading of large files (> 100MB videos for my use case) to be more memory efficient. When testing the method with a 25MB file, the memory usage of my Java Spring application in a Kubernetes cluster setup spiked up by 200MB! I also tried a file that was 200MB and the memory spiked up again to ~2GB. There were no out of memory exceptions thrown but the memory usage does not drop back down. Why does this happen?

public void uploadFile(MultipartFile file, String saveFileName) {
        try {
            ObjectMetadata metadata = new ObjectMetadata();

            if (file.getContentType() != null){
                om.setContentType(file.getContentType());
            }

            metadata.setContentLength(file.getSize());

            saveFileName = saveFileName.replaceAll(" ", "");

            InputStream stream = file.getInputStream();

            PutObjectRequest request = new PutObjectRequest(bucketName, saveFileName, stream, metadata);
            request = request.withMetadata(om);

            s3client.putObject(request);

            stream.close();
        } catch (AmazonClientException | IOException exception) {
            // handle exception
        }
    }
Andy Tang
  • 51
  • 2
  • 7
  • You should put stream.close() inside finally block. Have a good day! – Yunus Emre Güler Aug 25 '20 at 07:25
  • Please have a look at my answer https://stackoverflow.com/a/64263423/1704634 , it uses an S3OutputStream which automatically switches to multipart uploads in case the stream is too large. Currently uses a 10MB buffer, but this can be configured smaller/larger. – blagerweij Oct 08 '20 at 15:42

1 Answers1

3

There are multiple ways to handle large file uploads.

  1. Write byte array to disk and upload to S3 using a background task maybe.
  2. Store in memory and upload the object directly (bad bad option, unless you set a very low file upload limit)

take a look at this git repo on how the above methods can be achieved

I don't see your use case here. But if you are handling the UI as well consider uploading the files directly from the UI using pre-signed S3 URLs.

solecoder
  • 191
  • 3
  • 12
  • 1
    Thanks for referring me to this git repo. It has helped answer a lot of my questions. I might go with the pre-signed s3 URLs route :) – Andy Tang Oct 06 '19 at 09:49