1

I have a series of text files in a folder that I want to zip. I compress the folder and produce a zip file.

When I programmtically call the zipfile I get an error: BadZipFile: File is not a zip file.

I have been testing the zipped directory using this piece of code:

import zipfile
print (zipfile.is_zipfile("~/path/to/zipfile.zip") )
[output]:false

I have even tried programmtically creating a new zipped directory with this code and trying the above zipfile checker code, but also get False from this:

import os
import zipfile
def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))
zipf = zipfile.ZipFile('Zipped_file.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('~/Desktop/cleaned_files_2', zipf)
zipf.close()

What am I doing wrong that I am not producing valid zipped directories?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
RustyShackleford
  • 3,462
  • 9
  • 40
  • 81
  • Python code looks ok. Can you test the zipped file you've created with on one the command line zip programs. For example what does `zip -t Zipped_file.zip` output? – pmqs Jul 10 '21 at 17:22
  • @pmqs i got this error in terminal when I ran your line: `zip error: Invalid command arguments (invalid date entered for -t option - use mmddyyyy or yyyy-mm-dd)`. Is there anything else I can do to test? – RustyShackleford Jul 10 '21 at 17:23
  • Sorry, typo on my part I mean `zip -T Zipped_file.zip`. Also try `unzip -t Zipped_file.zip` – pmqs Jul 10 '21 at 17:29
  • @pmqs the above python code is generating an empty zip file. I went back and manually created a zip file. I ran both your lines and got this: `zip -T cleaned_files_2.zip test of cleaned_files_2.zip OK ` and `unzip -T cleaned_files_2.zip Updated time stamp for cleaned_files_2.zip.` I ran cleaned_files_2.zip into the zipfile checker in my first piece of code, and still get `FALSE`. – RustyShackleford Jul 10 '21 at 17:34
  • @pmqs I hardcoded the file path in the original function and passed back the zipfile to check if it is true zipfile, and I got a TRUE this time. experimenting now to see if it fixes issue. – RustyShackleford Jul 10 '21 at 17:38
  • You're turn for the typo. The `unzip` command uses a lower case `t` to trigger a test of the zip file, so the command is `unzip -t Zipped_file.zip`. That said, if the `zip -T` command thought the file was OK, I'd expect the `unzip -t` test to also think the file is OK. – pmqs Jul 10 '21 at 17:39
  • @pmqs i reran the `unzip -t`, all the files are ok but my process is still saying that the zipfile is not a zip file. I noticed that I hav 2 files that maybe causing a problem. `testing: __MACOSX/._cleaned_files_2 ` and `cleaned_files_2/.DS_Store`. How do I get rid of these and/or why do i have these in the zip package? – RustyShackleford Jul 10 '21 at 17:47
  • I tried you python code and it worked file on my Linux setup. Just to be clear - at what point are you running the `unzip -t` command - is it on the mac before you upload to s3?Check that the file hasn't been changed by the upload/download process. Run `cksum Zipped_file.zip` on your mac before uploading, then run it again on the file that is read from s3. The output should be identical. – pmqs Jul 10 '21 at 17:50
  • @pmqs I am running the `unzip -t` before uploading into s3. I redownloaded the file from s3 to local and the cksum is the same. This is interesting. I am thinking the issue maybe something like in this link: https://stackoverflow.com/questions/3083235/unzipping-file-results-in-badzipfile-file-is-not-a-zip-file. If it also helps I am following the tutorial https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial1_Basic_QA_Pipeline.ipynb and passing my own zip file in the `Preprocessing of documents section`, s3 part. No line numbers. – RustyShackleford Jul 10 '21 at 17:57
  • See https://stackoverflow.com/questions/63183098/remove-auto-generated-macosx-folder-from-inside-a-zip-file-in-python for delaing with the `__MACOSX` files – pmqs Jul 10 '21 at 18:03
  • @pmqs if you want to leave an answer i will accept it. – RustyShackleford Jul 10 '21 at 18:17

1 Answers1

2

Summarising the comments. The python code you supplied look fine and you say that round-tripping a zip file to s3 shows that it isn't being corrupted.

That then leaves the question of why you are getting the error BadZipFile: File is not a zip file.

If you need further help, can you try to supply a reproducible example that illustrates the problem. I suspect it is with the haystack.utils API you are running (referenced in your comments), but that isn't a module I know.

pmqs
  • 3,066
  • 2
  • 13
  • 22