0

I am doing some POC in Amazon Macie. I got from the documentation that it identifies PII data like credit card. Even I ran an example where I put some valid credit card numbers in CSV and put into S3 bucket and was identified by Macie.

I want to know if the same PII data is under some database backup/dump file, which is in S3 bucket. Will Macie be able to identify? I didn't find anything in the documentation.

halfer
  • 19,824
  • 17
  • 99
  • 186
Anand
  • 20,708
  • 48
  • 131
  • 198

1 Answers1

1

So a couple of things are important here

Macie can only handle certain types of files and certain compression formats

If you specify S3 buckets that include files of a format that isn't supported in Macie, Macie doesn't classify them.

Compression formats https://docs.aws.amazon.com/macie/latest/userguide/macie-compression-archive-formats.html

Encrypted Objects Macie can only handle certain types of encrypted Amazon S3 objects See the following link for more details: https://docs.aws.amazon.com/macie/latest/userguide/macie-integration.html#macie-encrypted-objects

Macie Limits

Macie has a default limit on the amount of data that it can classify in an account. After this data limit is reached, Macie stops classifying the data. The default data classification limit is 3 TB. This can be increased if requested.

Macie's content classification engine processes up to the first 20 MB of an S3 object.

So specifically if you dump is compressed but in a suitable format inside the compression then yes Macie can classify, but on an important note it will only classify the first 20 MB of the file which is a problem if the file is large.

Typically I use lambda to split a large file into files just under 20 MB. You still need to think if you have X number of files how do you take a record from a file that has been classified as PII and map it back into something that is useable.