3

Amazon S3 interprets my binary data as non-UTF-8 and modifies it when I write to a bucket.

Example using the official s3 Javascript client:

var png_file = new Buffer( "iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==", "base64" ).toString( "binary" );
s3.putObject( { 
  Bucket: bucket, 
  Key: prefix + file, 
  ContentType: "image/png;charset=utf-8", 
  CacheControl: "public, max-age=31536000",
  Body: png_file 
  // , ContentLength: png_file.length
}, function( e ){
  if ( e ) {
    console.log( e );
  } else {
    s3.getObject( { 
      Bucket: bucket, 
      Key: prefix + file
    }, function( e, v ) {
      if ( e ) {
        console.log( e )
      } else {
        console.log( v.ContentLength );
      }
    } );
  }
} );

Returns 105 while the original png_file is 85. S3 somehow modifies my file, and I think it has to do with charsets.

If I uncomment the Content-Length line, I get a 400 error on putObject(): The Content-MD5 you specified did not match what we received.

I get the same result if I calculate the MD5 hash myself (instead of letting the S3 library do it) with ContentMD5: crypto.createHash("md5").update(png_file).digest("base64"). This seems to acknowledge a difference between the data I send and the one S3 receives.

I have read a similarly titled issue, but it didn't solve the problem.

Community
  • 1
  • 1
ehmicky
  • 1,915
  • 4
  • 20
  • 29

2 Answers2

5

S3 putObject() assumes either a Buffer or an UTF-8 string. I should have sent the binary as it, not as a "binary string", meaning using new Buffer(...) instead of new Buffer(...).toString("binary").

ehmicky
  • 1,915
  • 4
  • 20
  • 29
1

It seems unlikely that S3 is actually modifying the content you are uploading. It seems nore likely that it's being interpreted incorrectly on download, because this does not seem valid for a png:

ContentType: "text/data;charset=utf-8", 

That's not correct for a png file. I would suggest that this is what you want:

ContentType: "image/png", 
Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Sorry, this was a typo from me. I updated the original question. Adding or not `;charset=utf-8` does not modify the output. – ehmicky Nov 08 '14 at 16:44
  • 2
    @ehmicky thanks for accepting my answer, but I don't think I really solved the problem for you -- I only spotted a typo, but you appear to have found the real problem and solved it. Accepting your own answer is fine, here, particularly in cases where you really did solve the problem yourself and none of the other answers offered the actual solution. You can "un-accept" my answer, I believe, by clicking the checkmark again. – Michael - sqlbot Nov 10 '14 at 13:12