28

I'm trying to extract files from a zip file using Python 2.7.1 (on Windows, fyi) and each of my attempts shows extracted files with Modified Date = time of extraction (which is incorrect).

import os,zipfile
outDirectory = 'C:\\_TEMP\\'
inFile = 'test.zip'
fh = open(os.path.join(outDirectory,inFile),'rb') 
z = zipfile.ZipFile(fh)
for name in z.namelist():
    z.extract(name,outDirectory)
fh.close()

I also tried using the .extractall method, with the same results.

import os,zipfile
outDirectory = 'C:\\_TEMP\\'
inFile = 'test.zip'
zFile = zipfile.ZipFile(os.path.join(outDirectory,inFile))        
zFile.extractall(outDirectory)

Can anyone tell me what I'm doing wrong?

I'd like to think this is possible without having to post-correct the modified time per How do I change the file creation date of a Windows file?.

MTAdmin
  • 1,023
  • 3
  • 17
  • 36

4 Answers4

20

Well, it does take a little post-processing, but it's not that bad:

import os
import zipfile
import time

outDirectory = 'C:\\TEMP\\'
inFile = 'test.zip'
fh = open(os.path.join(outDirectory,inFile),'rb') 
z = zipfile.ZipFile(fh)

for f in z.infolist():
    name, date_time = f.filename, f.date_time
    name = os.path.join(outDirectory, name)
    with open(name, 'wb') as outFile:
        outFile.write(z.open(f).read())
    date_time = time.mktime(date_time + (0, 0, -1))
    os.utime(name, (date_time, date_time))

Okay, maybe it is that bad.

Alexander O'Mara
  • 58,688
  • 18
  • 163
  • 171
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • 1
    That works, thanks. Just for clarification, is maintaining the mod date on extraction of files a limitation of Python's zipfile implementation or is this standard functionality across all zip libs? – MTAdmin Mar 22 '12 at 14:01
  • 1
    It's been 5 years since the reply, and it is still working under Python 3.6. It still is ugly as the day you wrote it, but it works. – EndermanAPM Feb 21 '17 at 12:13
  • Is this still needed with Python 3.6? Looking at source code I don't see any changes related to date/time information... – RvdK Apr 20 '18 at 08:08
  • Can you give a clarification why you add (0, 0, -1) to the date_time? Is the current a second off? – RvdK Apr 20 '18 at 08:27
  • @RvdK: I don't know if it is still needed in Python 3.6 -- try it and see. – Ethan Furman Apr 20 '18 at 12:48
  • 2
    @RvdK: The `(0, 0, -1)` is added because [`time.mktime`](https://docs.python.org/2/library/time.html#time.mktime) expects a 9-element `tuple`. The `-1` indicates that the `DST` flag is unknown. – Ethan Furman Apr 20 '18 at 12:48
  • Is it possible to get the timezone info for the zipped files? If a zip file was downloaded from a server with a different timezone, then the mod time for files is not retained as expected. – wypul Sep 23 '20 at 07:49
  • The ZIP format is ancient and doesn't have any concept of time zone or DST. Thus the use of mktime (inverse localtime) here. Yes, this means that files from a server running on UTC may be in the future when unzipped on a system running in the western hemisphere.... – nealmcb Jan 01 '21 at 14:27
9

Based on Jia103's answer, I have developed a function (using Python 2.7.14) which preserves directory and file dates AFTER everything has been extracted. This isolates any ugliness in the function, and you can also use zipfile.Zipfile.extractAll() or whatever zip extract method you want:

import time
import zipfile
import os

# Restores the timestamps of zipfile contents.
def RestoreTimestampsOfZipContents(zipname, extract_dir):
    for f in zipfile.ZipFile(zipname, 'r').infolist():
        # path to this extracted f-item
        fullpath = os.path.join(extract_dir, f.filename)
        # still need to adjust the dt o/w item will have the current dt
        date_time = time.mktime(f.date_time + (0, 0, -1))
        # update dt
        os.utime(fullpath, (date_time, date_time))

To preserve dates, just call this function after your extract is done.

Here's an example, from a script I wrote to zip/unzip game save directories:

        z = zipfile.ZipFile(zipname, 'r')
        print 'I have opened zipfile %s, ready to extract into %s' \
                % (zipname, gamedir)
        try: os.makedirs(gamedir)
        except: pass    # Most of the time dir already exists
        z.extractall(gamedir)
        RestoreTimestampsOfZipContents(zipname, gamedir)  #<--  USED
        print '%s zip extract done' % GameName[game]

Thanks everyone for your previous answers!

RubinMac
  • 164
  • 2
  • 5
8

Based on Ethan Fuman's answer, I have developed this version (using Python 2.6.6) which is a little more consise:

zf = ZipFile('archive.zip', 'r')
for zi in zf.infolist():
    zf.extract(zi)
    date_time = time.mktime(zi.date_time + (0, 0, -1))
    os.utime(zi.filename, (date_time, date_time))
zf.close()

This extracts to the current working directory and uses the ZipFile.extract() method to write the data instead of creating the file itself.

Ber
  • 40,356
  • 16
  • 72
  • 88
  • I am trying to use this code, but all files are saving to my root directory. where would I define the path where I want the file saved? – new_programmer_22 Sep 02 '19 at 14:34
  • @newwebdev22 The code extracts to the current directory. You can either change that or add a path argument to the extract() method, as state in the docs: https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extract Also, you may wont to check your ZIP archive for absolute paths. – Ber Sep 10 '19 at 07:24
6

Based on Ber's answer, I have developed this version (using Python 2.7.11), which also accounts for directory mod dates.

from os import path, utime
from sys import exit
from time import mktime
from zipfile import ZipFile

def unzip(zipfile, outDirectory):
    dirs = {}

    with ZipFile(zipfile, 'r') as z:
        for f in z.infolist():
            name, date_time = f.filename, f.date_time
            name = path.join(outDirectory, name)
            z.extract(f, outDirectory)

            # still need to adjust the dt o/w item will have the current dt
            date_time = mktime(f.date_time + (0, 0, -1))

            if (path.isdir(name)):
                # changes to dir dt will have no effect right now since files are
                # being created inside of it; hold the dt and apply it later
                dirs[name] = date_time
            else:
                utime(name, (date_time, date_time))

    # done creating files, now update dir dt
    for name in dirs:
       date_time = dirs[name]
       utime(name, (date_time, date_time))

if __name__ == "__main__":

    unzip('archive.zip', 'out')

    exit(0)

Since directories are being modified as the extracted files are being created inside them, there appears to be no point in setting their dates with os.utime until after the extraction has completed, so this version caches the directory names and their timestamps till the very end.

jia103
  • 1,116
  • 2
  • 13
  • 20