0

I have an AWS Lambda function configured to call start_file_transfer on an AWS Transfer Family SFTP Connector at X minute intervals. The problem I'm having is that sometimes the Lambda runs again before files can finish downloading, and it's understandably causing some file locking errors.

The obvious solution to me is: when the Lambda runs, check for current downloads in progress and skip them.

The start_file_transfer call:

transfer.start_file_transfer(
    ConnectorId="[my-connector-id]",
    RetrieveFilePaths=[sftp_file],
    LocalDirectoryPath="/[my-s3-bucket]/[my-s3-key]"
)

returns the following response:

{
    "TransferId": "3296exe9-21fy-063r-a97n-2c91e6u3ab81",
    "ResponseMetadata": {
        "RequestId": "l5au1f5e-v59u-763w-h7gr-9461y268135o",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "date": "Wed, 30 Aug 2023 15:30:30 GMT",
            "content-type": "application/x-amz-json-1.1",
            "content-length": "6429",
            "connection": "keep-alive",
            "x-amzn-requestid": "l5au1f5e-v59u-763w-h7gr-9461y268135o"
        },
        "RetryAttempts": 0
    }
}

and when the transfer operation completes a CloudWatch log is created:

{
    "operation": "RETRIEVE",
    "timestamp": "2023-08-30T16:30:33.227572Z",
    "connector-id": "[my-connector-id]",
    "transfer-id": "3296exe9-21fy-063r-a97n-2c91e6u3ab81",
    "file-transfer-id": "3296exe9-21fy-063r-a97n-2c91e6u3ab81/F6drt7oppd2+87YuTREWWW",
    "url": "sftp://sftp.example.com",
    "file-path": "my_path/file.txt",
    "status-code": "COMPLETED",
    "start-time": "2023-08-30T16:30:32.102939Z",
    "end-time": "2023-08-30T16:30:32.886031Z",
    "account-id": "999999999999",
    "connector-arn": "arn:aws:transfer:[my-region]:999999999999:connector/[my-connector-id]",
    "local-directory-path": "/[my-s3-bucket]/[my-s3-key]"
}

However, I can't find a way to natively monitor the 'in progress' downloads in between these operations. Nowhere in AWS do I see a way to query any resources using the returned TransferId. Any suggestions on how to natively query in-flight downloads on an SFTP Connector? Or is my best bet to build and maintain my own DynamoDB table to manage state?

gbeaven
  • 1,522
  • 2
  • 20
  • 40
  • How does your Lambda know which file to download in the first place? I would use that as the `state store` instead if possible rather than having a new DynamoDB involved. – Register Sole Aug 30 '23 at 17:14
  • 1
    Using dynamodb for distributed locks is not a bad idea IMO. It's likely to be pretty inexpensive for this use case. – jordanm Aug 30 '23 at 17:16
  • @RegisterSole The delta of downloaded files in S3 and available files on the SFTP server. The only natural states available with this process are "not yet started" and "completed". I'm looking for the "in progress", but doesn't seem natively available (yet). – gbeaven Aug 30 '23 at 17:25

0 Answers0