2

I have a very simple zipfile implementation in python which succeeds at extracting from certain zip files but fails consistently on other files. When it succeeds (obviously) the code executes and the files are extracted from the zip. When it fails the code also executes without any error but the files are NOT extracted from the zip.

Any ideas as to how I might investigate why the code is executing without error, but not actually extracting the files in those cases? I have tried testing the file with is_zipfile and also compress_type and those things look the same whether I am operating on a zipfile that ultimately works or one that consistently fails. So I am not sure how to pinpoint what the 'difference' is in the files that fail.

import zipfile

def unzip(ziph):
    ziph.extractall('C:\\')

if __name__ == '__main__':
    ziph = zipfile.ZipFile('foo.zip', 'r')
    unzip(ziph)
    ziph.close()

The last thing I can add is that both extract and extractall work on the files that work, and both fail to extract (but execute without error) on the zipfiles that fail. Python 2.76 ...not sure what else to include.

Dharman
  • 30,962
  • 25
  • 85
  • 135
10mjg
  • 573
  • 1
  • 6
  • 18
  • Can you include the `unzip_dir` function definition? – Oin Feb 20 '16 at 00:33
  • Ah I am sorry. unzip_dir is just unzip - (I cleaned up my code as I entered it onto stackoverflow and left that inconsistency) - fixing now - – 10mjg Feb 20 '16 at 00:39
  • 1
    If member filenames in the archive have an absolute path, it possible for them to be extracted to a directory other than the one you've specified. – martineau Feb 20 '16 at 00:40
  • @martineau: that may be the key. they DO have an absolute path. obviously i will now look up how to change that or work around that if needed. if you have a sense... i am all ears! – 10mjg Feb 20 '16 at 00:52
  • 1
    Based on the Note in the [documentation](https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extract), it sounds like if you use the `.extract()` method instead of `.extractall` (and iterate through the members of the archive yourself), it will override any absolute paths if you specify one in the call. – martineau Feb 20 '16 at 01:01
  • @Martineau - thanks - I am trying - I am now iterating over the members of the archive using .namelist and then trying to use .extract with arguments (namelist_file, 'C:\\') ... still not working. Ahh feels so close but still no cigar.... – 10mjg Feb 20 '16 at 01:11
  • If you look at the source for the `namelist()` method in the `zipfile` module (which is in [Lib/zipfile.py](https://hg.python.org/cpython/file/2.7/Lib/zipfile.py#l872)), it looks like what it puts in the list may include the entire full path — so you would need to extract just the part you want using `os.path.basename()` or similar and use that in your calls to `extract()`. – martineau Feb 20 '16 at 01:21
  • Ah. I figured it out. It seems to be concatenating whatever i plug in for the second argument into extract with the path that is built in to the zipfile. So If I pass it C:\ it goes to C:\built_in_path, if I pass it S:\ it goes to S:\built_in_path. I was looking for it in my specified directory, not in my directory + the built in path. Ugh... problem solved though, sort of. The only question that remains is how to actually get the thing into your specified path WITHOUT concatenating with the built in path..... thank you .... – 10mjg Feb 20 '16 at 01:21
  • @Martineau: I tried using os.path.basename() to get just the filename without the path and then passing that to extract(). But doing it that way, extract() can't find the file (that seems sort of obvious). But I am too dense to think of how to use basename in the right order such that it extracts the right file, but then writes it to the path of my choice with the prior path stripped. If it's easy for you to write a couple lines of code showing in what order you would do that, obviously that'd be great. Otherwise I can probably figure it out... – 10mjg Feb 20 '16 at 01:28
  • Take a look at [this answer](http://stackoverflow.com/a/4917469/355230). – martineau Feb 20 '16 at 02:57
  • Try `'rb'` instead of `'r'`. – Mark Adler Feb 20 '16 at 03:37

0 Answers0