94

I am uploading a file to S3 using Java - this is what I got so far:

AmazonS3 s3 = new AmazonS3Client(new BasicAWSCredentials("XX","YY"));

List<Bucket> buckets = s3.listBuckets();

s3.putObject(new PutObjectRequest(buckets.get(0).getName(), fileName, stream, new ObjectMetadata()));

The file is being uploaded but a WARNING is raised when I am not setting the content length:

com.amazonaws.services.s3.AmazonS3Client putObject: No content length specified for stream > data.  Stream contents will be buffered in memory and could result in out of memory errors.

This is a file I am uploading and the stream variable is an InputStream, from which I can get the byte array like this: IOUtils.toByteArray(stream).

So when I try to set the content length and MD5 (taken from here) like this:

// get MD5 base64 hash
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
messageDigest.reset();
messageDigest.update(IOUtils.toByteArray(stream));
byte[] resultByte = messageDigest.digest();
String hashtext = new String(Hex.encodeHex(resultByte));

ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(IOUtils.toByteArray(stream).length);
meta.setContentMD5(hashtext);

It causes the following error to come back from S3:

The Content-MD5 you specified was invalid.

What am I doing wrong?

Any help appreciated!

P.S. I am on Google App Engine - I cannot write the file to disk or create a temp file because AppEngine does not support FileOutputStream.

Community
  • 1
  • 1
JohnIdol
  • 48,899
  • 61
  • 158
  • 242
  • IOUtils.toByteArray read the whole file into your memory so depending on the size of your files, it could not be the adequate solution. A better solution would be to request the file provider about the file size and then streaming it to S3, this way you don't have to download all the files in memory since you have already the information about the size – Hamdi Nov 21 '20 at 14:15

8 Answers8

73

Because the original question was never answered, and I had to run into this same problem, the solution for the MD5 problem is that S3 doesn't want the Hex encoded MD5 string we normally think about.

Instead, I had to do this.

// content is a passed in InputStream
byte[] resultByte = DigestUtils.md5(content);
String streamMD5 = new String(Base64.encodeBase64(resultByte));
metaData.setContentMD5(streamMD5);

Essentially what they want for the MD5 value is the Base64 encoded raw MD5 byte-array, not the Hex string. When I switched to this it started working great for me.

Marcelo Glasberg
  • 29,013
  • 23
  • 109
  • 133
  • And we have a winnahhhh! Thanks for the extra effort answering the MD5 issue. That's the part I was digging for... – Geek Stocks Oct 05 '13 at 03:34
  • What is content in this case? i didn't get it. I am having the same warning. A little help, please.? – Shaonline Mar 24 '16 at 12:28
  • @Shaonline content is the inputStream – sirvon Mar 24 '16 at 23:37
  • Any way to convert from Hex back to the MD5 byte-array? That's what we store in our DB. – Joel May 26 '16 at 13:51
  • Please note that meta.setContentLength(IOUtils.toByteArray(stream).length); consumes the InputStream. When the AWS API tries to read it, it's zero length and therefore fails. You need to create a new input stream from ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes); – Bernie Lenz Jun 28 '17 at 17:58
  • Use com.amazonaws.util.Md5Utils.md5AsBase64(byte[]) instead. – pwojnowski May 26 '20 at 14:55
46

If all you are trying to do is solve the content length error from amazon then you could just read the bytes from the input stream to a Long and add that to the metadata.

/*
 * Obtain the Content length of the Input stream for S3 header
 */
try {
    InputStream is = event.getFile().getInputstream();
    contentBytes = IOUtils.toByteArray(is);
} catch (IOException e) {
    System.err.printf("Failed while reading bytes from %s", e.getMessage());
} 

Long contentLength = Long.valueOf(contentBytes.length);

ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(contentLength);

/*
 * Reobtain the tmp uploaded file as input stream
 */
InputStream inputStream = event.getFile().getInputstream();

/*
 * Put the object in S3
 */
try {

    s3client.putObject(new PutObjectRequest(bucketName, keyName, inputStream, metadata));

} catch (AmazonServiceException ase) {
    System.out.println("Error Message:    " + ase.getMessage());
    System.out.println("HTTP Status Code: " + ase.getStatusCode());
    System.out.println("AWS Error Code:   " + ase.getErrorCode());
    System.out.println("Error Type:       " + ase.getErrorType());
    System.out.println("Request ID:       " + ase.getRequestId());
} catch (AmazonClientException ace) {
    System.out.println("Error Message: " + ace.getMessage());
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
}

You'll need to read the input stream twice using this exact method so if you are uploading a very large file you might need to look at reading it once into an array and then reading it from there.

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
tarka
  • 5,289
  • 10
  • 51
  • 75
37

For uploading, the S3 SDK has two putObject methods:

PutObjectRequest(String bucketName, String key, File file)

and

PutObjectRequest(String bucketName, String key, InputStream input, ObjectMetadata metadata)

The inputstream+ObjectMetadata method needs a minimum metadata of Content Length of your inputstream. If you don't, then it will buffer in-memory to get that information, this could cause OOM. Alternatively, you could do your own in-memory buffering to get the length, but then you need to get a second inputstream.

Not asked by the OP (limitations of his environment), but for someone else, such as me. I find it easier, and safer (if you have access to temp file), to write the inputstream to a temp file, and put the temp file. No in-memory buffer, and no requirement to create a second inputstream.

AmazonS3 s3Service = new AmazonS3Client(awsCredentials);
File scratchFile = File.createTempFile("prefix", "suffix");
try {
    FileUtils.copyInputStreamToFile(inputStream, scratchFile);    
    PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, id, scratchFile);
    PutObjectResult putObjectResult = s3Service.putObject(putObjectRequest);

} finally {
    if(scratchFile.exists()) {
        scratchFile.delete();
    }
}
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Peter Dietz
  • 2,599
  • 1
  • 24
  • 23
  • The second argument in copyInputStreamToFile(inputStream, scratchFile) is Type File or OutputStream? – Shaonline Feb 02 '16 at 10:55
  • 2
    although this is IO intensive, but I still vote for this. since this might be the best way to avoid OOM on bigger file object. However, anyone could also read certain n*bytes and create part files and upload to s3 separately. – linehrr Feb 14 '18 at 05:49
8

While writing to S3, you need to specify the length of S3 object to be sure that there are no out of memory errors.

Using IOUtils.toByteArray(stream) is also prone to OOM errors because this is backed by ByteArrayOutputStream

So, the best option is to first write the inputstream to a temp file on local disk and then use that file to write to S3 by specifying the length of temp file.

srikanta
  • 2,914
  • 3
  • 21
  • 35
  • 2
    Thanks but I am on google app engine (updated question) - cannot write the file to disk, if I could do that I could use the putObject overload that takes a File :( – JohnIdol Dec 02 '11 at 05:30
  • @srikanta Just took your advice. No need to specify the length of temp file. Just pass the temp file as is. – Siya Sosibo Jun 05 '16 at 19:35
  • 1
    FYI the temp file approach is NOT an option if, like me, you want to specify server-side encryption, which is done in the ObjectMetadata. Unfortunately there is no PutObjectRequest(String bucketName, String key, File file, ObjectMetadata metadata) – Kevin Pauli Aug 15 '16 at 21:33
  • @kevin pauli You can do `request.setMetadata();` – dbaq Dec 20 '16 at 00:32
  • Did found other solution that creating a temporary File to store content who need to be send to S3. A bit sad, will be nice to provide in-memory data... – Camille Mar 17 '22 at 14:59
6

i am actually doing somewhat same thing but on my AWS S3 storage:-

Code for servlet which is receiving uploaded file:-

import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;

import com.src.code.s3.S3FileUploader;

public class FileUploadHandler extends HttpServlet {

    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        doPost(request, response);
    }

    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        PrintWriter out = response.getWriter();

        try{
            List<FileItem> multipartfiledata = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);

            //upload to S3
            S3FileUploader s3 = new S3FileUploader();
            String result = s3.fileUploader(multipartfiledata);

            out.print(result);
        } catch(Exception e){
            System.out.println(e.getMessage());
        }
    }
}

Code which is uploading this data as AWS object:-

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.List;
import java.util.UUID;

import org.apache.commons.fileupload.FileItem;

import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3Object;

public class S3FileUploader {


    private static String bucketName     = "***NAME OF YOUR BUCKET***";
    private static String keyName        = "Object-"+UUID.randomUUID();

    public String fileUploader(List<FileItem> fileData) throws IOException {
        AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
        String result = "Upload unsuccessfull because ";
        try {

            S3Object s3Object = new S3Object();

            ObjectMetadata omd = new ObjectMetadata();
            omd.setContentType(fileData.get(0).getContentType());
            omd.setContentLength(fileData.get(0).getSize());
            omd.setHeader("filename", fileData.get(0).getName());

            ByteArrayInputStream bis = new ByteArrayInputStream(fileData.get(0).get());

            s3Object.setObjectContent(bis);
            s3.putObject(new PutObjectRequest(bucketName, keyName, bis, omd));
            s3Object.close();

            result = "Uploaded Successfully.";
        } catch (AmazonServiceException ase) {
           System.out.println("Caught an AmazonServiceException, which means your request made it to Amazon S3, but was "
                + "rejected with an error response for some reason.");

           System.out.println("Error Message:    " + ase.getMessage());
           System.out.println("HTTP Status Code: " + ase.getStatusCode());
           System.out.println("AWS Error Code:   " + ase.getErrorCode());
           System.out.println("Error Type:       " + ase.getErrorType());
           System.out.println("Request ID:       " + ase.getRequestId());

           result = result + ase.getMessage();
        } catch (AmazonClientException ace) {
           System.out.println("Caught an AmazonClientException, which means the client encountered an internal error while "
                + "trying to communicate with S3, such as not being able to access the network.");

           result = result + ace.getMessage();
         }catch (Exception e) {
             result = result + e.getMessage();
       }

        return result;
    }
}

Note :- I am using aws properties file for credentials.

Hope this helps.

streak
  • 1,121
  • 1
  • 19
  • 28
4

I've created a library that uses multipart uploads in the background to avoid buffering everything in memory and also doesn't write to disk: https://github.com/alexmojaki/s3-stream-upload

Alex Hall
  • 34,833
  • 5
  • 57
  • 89
-2

Just passing the file object to the putobject method worked for me. If you are getting a stream, try writing it to a temp file before passing it on to S3.

amazonS3.putObject(bucketName, id,fileObject);

I am using Aws SDK v1.11.414

The answer at https://stackoverflow.com/a/35904801/2373449 helped me

Vikram
  • 368
  • 3
  • 12
  • 3
    If you have a stream, you want to use that stream. Writing stream to (temp) file just to get its data is inefficient and gives you additional headache (deleting file, disk usage) – devstructor Jul 02 '20 at 13:39
  • this will not allow you to pass metadata, such as Encryption , which is common practice when storing in AWS – user1412523 Oct 13 '20 at 08:14
-16

adding log4j-1.2.12.jar file has resolved the issue for me

Rajesh
  • 7
  • 3
    -1 : I guess this will just hide the log warning but not solve the error itself. Sorry to be so harsh, it's your first answer after all, but this does not solve this question. – romualdr Jan 06 '17 at 10:01