1

I've written a small server function which is intended to tar together a bunch of locally downloaded files, then delete the originals. It looks something like this:

with tarfile.open(archive_filename, "w:gz") as tar:
    for pb in designated_objects:
        bucket.download_file(pb.key, pb.key)
        tar.add(pb.key)
        os.delete(pb.key)

My expectation is that this will generate a tarfile with all of my desired data and an otherwise empty directory. The idea here is that I would like to minimize my disc usage as much as possible. However, I'm unsure if deleting a file before the tarfile is finished being generated (as done here) is allowed.

Will this expression work as expected?

If it will not, is there something akin to an append mode that will?

Aleksey Bilogur
  • 3,686
  • 3
  • 30
  • 57
  • 2
    seems like the simplest way to find out would be to just try it – dave Jan 17 '18 at 01:28
  • 1
    Is this for an AWS s3 bucket? Have you considered using [`download_fileobj`](http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.download_fileobj) instead? That way you won't have to bother putting duplicate data on-disk just to delete it moments later. – sytech Jan 17 '18 at 02:10
  • @sytech This is indeed an AWS S3 operation. I'm impressed you've caught onto it! And it's a Lambda function that I'm trying to optimize memory usage on. – Aleksey Bilogur Jan 17 '18 at 02:51
  • @dave I did try it. See the answer below. – Aleksey Bilogur Jan 17 '18 at 02:57
  • 1
    @AlekseyBilogur then using `download_fileobj` sounds like the way to go. Because `tarfile.open` accepts file-like objects, you should be able to download the file directly into your archive without putting it on-disk intermediately. – sytech Jan 17 '18 at 03:56
  • I may go back to that approach. But at the moment the simplest solution for my use-case seems to be https://stackoverflow.com/questions/25086722/downloading-pattern-matched-entries-from-s3-bucket actually. – Aleksey Bilogur Jan 18 '18 at 21:28

1 Answers1

0

As expected, the original files are populated, then deleted. However, the behavior of the archive is unusual. When this code block is run, no archive is generated. In fact, this code block will do nothing at all (except delete your files).

I find this behavior particularly unusual and surprising given the fact that taking a pass inside the with statement (as in the code that follows) will actually write an empty archive to disc. So in a sense, the given code block does even less than nothing!

with tarfile.open('archive_filename.xy.gz', "w:gz") as tar:
   pass

For reference, this behavior is what I get with Python 3.6. Behavior with other versions of Python may differ.

Aleksey Bilogur
  • 3,686
  • 3
  • 30
  • 57