Amazon S3 interprets my binary data as non-UTF-8 and modifies it when I write to a bucket.
Example using the official s3 Javascript client:
var png_file = new Buffer( "iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==", "base64" ).toString( "binary" );
s3.putObject( {
Bucket: bucket,
Key: prefix + file,
ContentType: "image/png;charset=utf-8",
CacheControl: "public, max-age=31536000",
Body: png_file
// , ContentLength: png_file.length
}, function( e ){
if ( e ) {
console.log( e );
} else {
s3.getObject( {
Bucket: bucket,
Key: prefix + file
}, function( e, v ) {
if ( e ) {
console.log( e )
} else {
console.log( v.ContentLength );
}
} );
}
} );
Returns 105
while the original png_file
is 85
. S3 somehow modifies my file, and I think it has to do with charsets.
If I uncomment the Content-Length
line, I get a 400 error on putObject()
: The Content-MD5 you specified did not match what we received
.
I get the same result if I calculate the MD5 hash myself (instead of letting the S3 library do it) with ContentMD5: crypto.createHash("md5").update(png_file).digest("base64")
. This seems to acknowledge a difference between the data I send and the one S3 receives.
I have read a similarly titled issue, but it didn't solve the problem.