-1
for <some-condition>:
    g.to_csv(f'/tmp/{k}.csv')
    

This example makes use of /tmp/. When /tmp/ not used in g.to_csv(f'/tmp/{k}.csv') then it gives Read only file system error from here https://stackoverflow.com/a/42002539/13016237, so question is if AWS lambda clears /tmp/ on its own or is it to be done manually. Is there any workaround for this within the scope of boto3. Thanks!

pc_pyr
  • 562
  • 5
  • 20
  • what should the lambda do? save a csv (coming from pandas) to s3? – balderman Sep 14 '20 at 07:39
  • To ensure the file is cleaned up, you can use the [tempfile](https://docs.python.org/3/library/tempfile.html) module, either `NamedTemporaryFile` or `TemporaryDirectory`. – Jiří Baum Sep 14 '20 at 07:47

2 Answers2

0

/tmp, as the name suggest, is only a temporary storage. It should not be relied upon for any long term data storage. The files in /tmp persist for as long as lambda execution context is kept alive. The time is not defined and varies.

To overcome the size limitation (512 MB) and to ensure long term data storage there are two solutions employed:

The use of the EFS is easier (but not cheaper), as this will present a regular filesystem to your function which you can write and read directly. You can also re-use the same filesystem across multiple lambda functions, instances, containers and more.

The S3 will be cheaper but there is some extra work required from you to seamlessly use in lambda. Pandas does support S3, but for seamless integration you would have to include S3FS in your deployment package (or layer) if not already present. The S3 can also be accessed from different functions, instances and containers.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • Thanks @Marcin, so is there a way for clearing /tmp manually? Is it recommended – pc_pyr Sep 14 '20 at 07:47
  • @pc_pyr Yes, you can just use regular python tools for that. For example [shutil.rmtree](https://docs.python.org/3/library/shutil.html#shutil.rmtree) to remove folders that you create in `/tmp`. If you have some confidential data, you can shred the files yourself, or not store them in `/tmp` at all. – Marcin Sep 14 '20 at 07:49
  • @pc_pyr Yes, you can upload to s3, but you need S3Fs for seamless integration. I don't know specifics of pandas, so can't give more details on that. – Marcin Sep 14 '20 at 08:02
0

g.to_csv('s3://my_bucket/my_data.csv') should work if you will package s3fs with your lambda.

Another option is to save the csv into memory and use boto3 to create an object in s3

balderman
  • 22,927
  • 7
  • 34
  • 52