0

I'm targeting a *_foo_*.csv file in the hirerachy below :

enter image description here

NB : the 2nd.zip is optional (i.e, the files of the 2nd.zip could be directly zipped in the 1st.zip.

My code below returns always None :

from zipfile import ZipFile
import pandas as pd

fp = r'C:\Users\rendezvous\Downloads\1st.zip'

def zip_to_df(fp):
    zip_file = ZipFile(fp)
    for name in zip_file.namelist():
        if name.endswith('.csv'):
            df = pd.read_csv(zip_file.open(name))
            return df
        elif name.endswith('.zip'):
            zip_to_df(zip_file.open(name))

df = zip_to_df(fp)

Can you explain why, please ? I can't figure it out.

  • Does this answer your question? [How to read from a zip file within zip file in Python?](https://stackoverflow.com/questions/12025469/how-to-read-from-a-zip-file-within-zip-file-in-python) – SomeSimpleton Jul 21 '23 at 06:33
  • Thanks @SomeSimpleton but I'm afraid it does not. –  Jul 21 '23 at 06:35
  • So let me try and explain this as quick and easily as possible. I dont really have a solution for you as I dont work with zip files often and dont know much about the library but. Looking at the documentation for zipfile https://docs.python.org/3/library/zipfile.html#zipfile-objects When creating a ZipFile object you need "Open a ZIP file, where file can be a path to a file (a string), a file-like object or a path-like object." but when using the ```.open()``` method you get "Access a member of the archive as a binary file-like object." – SomeSimpleton Jul 21 '23 at 06:42
  • So what is happening is your trying to push a file thats open in binary into the Zipfile \_\_init__ which only takes a path or plain file like objects. Hopefully that helps you get closer to solving your problem. – SomeSimpleton Jul 21 '23 at 06:44
  • Think carefully about what happens when you make the recursive call. The expression `zip_to_df(zip_file.open(name))` will evaluate to the result, which is then **ignored**, allowing the `for` loop in the current call to proceed. Eventually, the loop runs out of options and reaches the end of the function. It has nothing to do with Zip files, and is in fact a common logical error - please see the linked duplicate. – Karl Knechtel Jul 21 '23 at 07:25
  • If you need to "search" recursively, the problem becomes only slightly harder: if you get a `None` from the recursive call, it's because nothing was found in that nested archive, so keep going; if you find a non-None value, return it so that it can bubble back up the recursion. – Karl Knechtel Jul 21 '23 at 07:26
  • (See also [Finding a key recursively in a dictionary](https://stackoverflow.com/questions/14962485) for an example of that technique applied to a different problem.) – Karl Knechtel Jul 21 '23 at 07:32

0 Answers0