3

I have a very big file (estimated 50MB). I uploaded this file to an S3 directory which then triggers a Lambda function. I am using TransferManager to do the S3 upload.

I read the AWS document about Lambda and S3, it says the Lambda function is triggered when a file is created in S3, but I am still wondering if the created file means a file is uploaded completely.

My question: because of the large size of the file, there's a tiny delay of upload, so that is the Lambda function triggered before or after the file is uploaded completely? For example: during uploading a large file, the network is shut down, the file may be broken, so will the Lambda function still be triggered?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
franco phong
  • 2,219
  • 3
  • 26
  • 43
  • I think the bytes stream will be stored in a buffer and write it as a file, so the file will be created at a certain time. If an error occurs, then the writing process is not finished and the broken part cannot be a file. – Lamanus Sep 11 '19 at 16:51

1 Answers1

6

While very large is a relative term, S3 is used to handle much larger files than that, so AWS thought of that. Lambda Events will be triggered after the file has been created completely.

The documentation describes this in more detail.

While not explicitly mentioned the documentation says that:

Amazon S3 invokes your function asynchronously with an event that contains details about the object. The following example shows an event that Amazon S3 sent when a deployment package was uploaded to Amazon S3.

(Emphasis mine)

Since it includes details about the object including its size, the object will have been uploaded completely, otherwise that wouldn't be known - here's a sample event that includes the size:

{
  "Records": [
    {
      "eventVersion": "2.1",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-2",
      "eventTime": "2019-09-03T19:37:27.192Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "AWS:AIDAINPONIXQXHT3IKHL2"
      },
      "requestParameters": {
        "sourceIPAddress": "205.255.255.255"
      },
      "responseElements": {
        "x-amz-request-id": "D82B88E5F771F645",
        "x-amz-id-2": "vlR7PnpV2Ce81l0PRw6jlUpck7Jo5ZsQjryTjKlc5aLWGVHPZLj5NeC6qMa0emYBDXOo6QBU0Wo="
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "828aa6fc-f7b5-4305-8584-487c791949c1",
        "bucket": {
          "name": "lambda-artifacts-deafc19498e3f2df",
          "ownerIdentity": {
            "principalId": "A3I5XTEXAMAI3E"
          },
          "arn": "arn:aws:s3:::lambda-artifacts-deafc19498e3f2df"
        },
        "object": {
          "key": "b21b84d653bb07b05b1e6b33684dc11b",
          "size": 1305107,
          "eTag": "b21b84d653bb07b05b1e6b33684dc11b",
          "sequencer": "0C0F6F405D6ED209E1"
        }
      }
    }
  ]
}
Maurice
  • 11,482
  • 2
  • 25
  • 45
  • 1
    Agreed. Objects in Amazon S3 either exist, or don't exist. There is no concept of a "partial object". The object only exists after the upload successfully completes. – John Rotenstein Sep 11 '19 at 23:57
  • You could argue, that during multi-part uploads the object partially exists, but since you can't see it via the "standard"-APIs (GetObject) I think your statement is fair. – Maurice Sep 12 '19 at 07:38
  • In the past, I did see a broken file uploaded in S3 because of network issue, it created a file in S3 but its size is zero or very small like 5KB 10KB (the real size is 50MB). At that time, I didn't use lambda functions so that I cannot verify if lambda is still triggered. If anyone has any experience about this, please share. Otherwise, this answer is the best. – franco phong Sep 12 '19 at 07:46