0

I'm trying to find out how I can solve a situation: I have file of any format (it could be either txt or jpeg for example) and I upload this file to the bucket with AmazonS3 object

@Autowired
private AmazonS3 amazonS3;

private void uploadFile(final String bucket, final String key, final String fileName) {
        File file = new File(fileName);
        amazonS3.putObject(bucket, key, file);
    }

Then I can get this file in this way:

S3Object s3Object = amazonS3.getObject(bucket, key);

And I wonder how I can compare that my file and file i downloaded from S3. Is there any code for comparing? Thx in advance.

Ilya
  • 135
  • 1
  • 2
  • 10
  • Compare in what sense? Content, file size, ...? – Marcin Jun 15 '21 at 22:59
  • Download the object and then compare the original file with the downloaded file: https://stackoverflow.com/questions/27379059/determine-if-two-files-store-the-same-content – John Hanley Jun 15 '21 at 23:13
  • @Marcin compare that they are equal – Ilya Jun 16 '21 at 07:34
  • @JohnHanley I saw this topic too. But The problem is that the file I download is S3Object and I can't compare it with File just as in this question – Ilya Jun 16 '21 at 07:36

1 Answers1

0

The aws s3 api has the headObject operation which returns the properties of the file. One of the properties called eTag which contains the md5 hash digest of the file. You could compare that with the md5 digest of the file.

here is the java code that I found from a stackoverflow question which shows how to generate md5 hash from a local file.

MessageDigest md = MessageDigest.getInstance("MD5");
try (InputStream is = Files.newInputStream(Paths.get("file.txt"));
     DigestInputStream dis = new DigestInputStream(is, md)) 
{
  /* Read decorated stream (dis) to EOF as normal... */
}
byte[] digest = md.digest();

Update: As mentioned in the comment section, it looks like eTag is not always md5 hash of the file. This approach can still be used if you are sure that the files are uploaded using simple PUT operations. When the Etag matches the md5 of the file?

Basically, if the object was uploaded with a single PUT operation and doesnt use Customer Managed or KMS keys for encryption then the resulting ETag is just the MD5 hexdigest of the object. Reference

Arun Kamalanathan
  • 8,107
  • 4
  • 23
  • 39
  • 1
    The eTag is sometimes the MD5 hash, but not always. – John Hanley Jun 16 '21 at 01:25
  • 2
    From [docs](https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/): "Whether the ETag is an MD5 digest depends on how the object was created and encrypted. Because the ETag isn't always an MD5 digest, it can't always be used for verifying the integrity of uploaded files." – Marcin Jun 16 '21 at 01:42