19

I have a lambda function that reads from DynamoDB and creates a large file (~500M) in /tmp that finally uploaded to s3. Once uploaded the lambda clears the file from /tmp (since there is a high probability that the instance may be reused)

This function takes about 1 minute to execute, even if you ignore the latencies.

In this scenario, when i try to invoke the function again, in < 1m, i have no control if i will have enough space to write to /tmp. My function fails.

Questions: 1. What are the known work arounds in these kind of scenario? (Potentially give more space in /tmp or ensure a clean /tmp is given for each new execution) 2. What are the best practices regarding file creation and management in Lambda? 3. Can i attach another EBS or other storage to Lambda for execution ? 4. Is there a way to have file system like access to s3 so that my function instead of using /tmp can write directly to s3?

sandeepzgk
  • 511
  • 2
  • 6
  • 16
  • 3
    I don't see why you NEED a file (or file system), especially considering that you're using FaaS/Amazon Lambda. Could you rewrite your code so that the DynamoDB output is streamed to S3 without writing it to disk? – C-Otto Jun 23 '16 at 11:36
  • There is a lot of processing that needs to be done.not just a simple dump from dynamo to s3 – sandeepzgk Jun 23 '16 at 11:44
  • Maybe you're just hitting the limit (512M) then? https://docs.aws.amazon.com/lambda/latest/dg/limits.html It might help to work in-memory, or add a third service for temporary storage in between. – C-Otto Jun 23 '16 at 11:49
  • @usama not really, the best way is to clean up after you use it, else may be just clean before use. – sandeepzgk Oct 11 '18 at 18:51

4 Answers4

11

I doubt that two concurrently running instances of AWS Lambda will share /tmp or any other local resource, since they must execute in complete isolation. Your error should have a different explanation. If you mean, that a subsequent invocation of AWS Lambda reuses the same instance, then you should simply clear /tmp on your own.

In general, if your Lambda is a resource hog, you better do that work in an ECS container worker and use the Lambda for launching ECS tasks, as described here.

Leon
  • 31,443
  • 4
  • 72
  • 97
  • 1
    I have seen that the function is reused a lot. And have faced issues where the instance is reused. Which means the tmp folder is unavailable at that instance – sandeepzgk Jun 23 '16 at 11:46
  • 16
    This answer is correct. Instances are often reused, **but not concurrently**. Clean up your temp directory when the function finishes -- or just remove any files that are present when it starts (meaning after the handler is invoked, not during initialization, of course) -- and this should not be a problem. +1 – Michael - sqlbot Jun 23 '16 at 12:59
  • This is only a _partial_ answer to the four questions listed and I'm actually interested in the answers to the _unanswered_ ones. – Anthony Atkinson Jan 17 '17 at 14:22
  • 3
    @usama ```rm -rf /tmp/*``` – Alexander Korzhykov Mar 01 '18 at 14:20
  • @Leon are you sure? [This exert](https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html) from the AWS states (of the /tmp dir) 'The directory content remains when the Execution Context is frozen, providing transient cache that can be used for multiple invocations' – Peza Jul 03 '18 at 21:41
  • @Peza Yes, I am sure. The context can be reused by a subsequent invocation of the *same AWS Lambda instance. Two different Lambda instances do not share the `/tmp` directory. You can also read [my other answer](https://stackoverflow.com/questions/38016683/aws-lambda-and-java-concurrency/38041246#38041246) – Leon Jul 04 '18 at 14:43
5

You are likely running into the 512 MB /tmp limit of AWS Lambda.

You can improve your performance and address your problem by storing the file in-memory, since the memory limit for Lambda functions can go as high as 1.5 GB.

Mark Stosberg
  • 12,961
  • 6
  • 44
  • 49
  • The issue i was facing is not technically the limit of tmp size, since the containers are being reused, the tmp is not really empty when i get it, it has some stuff from previous executions. – sandeepzgk Oct 11 '18 at 19:15
5

Starting March 2022, Lambda now supports increasing /tmp directory's maximum size limit up to 10,240MB. More information available here.

Konrad Kalemba
  • 1,209
  • 13
  • 21
0

Now it is even easy, AWS storage can be increased to 10GB named Ephemeral Storage. It is available in general configuration of the AWS lambda functions.