6

I'm working in a memory constrained environment where I need to make archives of SQL dumps. If I use python's built in tarfile module is the '.tar' file held in memory or written to disk as it's created?

For instance, in the following code, if huge_file.sql is 2GB will the tar variable take up 2GB in memory?

import tarfile

tar = tarfile.open("my_archive.tar.gz")), "w|gz")
tar.add('huge_file.sql')
tar.close()
Chris W.
  • 37,583
  • 36
  • 99
  • 136

1 Answers1

5

No it is not loading it in memory. You can read the source for tarfile to see that it's using copyfileobj, which is using a fixed size buffer to copy from the file to the tarball:

def copyfileobj(src, dst, length=None):
    """Copy length bytes from fileobj src to fileobj dst.
       If length is None, copy the entire content.
    """
    if length == 0:
        return
    if length is None:
        shutil.copyfileobj(src, dst)
        return

    BUFSIZE = 16 * 1024
    blocks, remainder = divmod(length, BUFSIZE)
    for b in xrange(blocks):
        buf = src.read(BUFSIZE)
        if len(buf) < BUFSIZE:
            raise IOError("end of file reached")
        dst.write(buf)

    if remainder != 0:
        buf = src.read(remainder)
        if len(buf) < remainder:
            raise IOError("end of file reached")
        dst.write(buf)
    return
ire_and_curses
  • 68,372
  • 23
  • 116
  • 141
Spike Gronim
  • 6,154
  • 22
  • 21
  • +1 for linking to the source. Development docs now also have a link to sources http://docs.python.org/dev/library/tarfile – jfs Mar 10 '11 at 22:35