0

Below is my most recent attempt; but alas, I print 'current_file' and it's always the same (first) .zip file in my directory?

Why/how can I iterate this to get to the next file in my zip directory?

my DIRECTORY_LOCATION has 4 zip files in it.

def find_file(cls):

    listOfFiles = os.listdir(config.DIRECTORY_LOCATION)  
    total_files = 0
    for entry in listOfFiles:
        total_files += 1  
        # if fnmatch.fnmatch(entry, pattern):
        current_file = entry
        print (current_file)

        """"Finds the excel file to process"""
        archive = ZipFile(config.DIRECTORY_LOCATION + "/" + current_file)
        for file in archive.filelist:
            if file.filename.__contains__('Contact Frog'):
                return archive.extract(file.filename, config.UNZIP_LOCATION)

    return FileNotFoundError

find_file usage:

excel_data = pandas.read_excel(self.find_file())

Update:

I just tried changing return to yield at:

yield archive.extract(file.filename, config.UNZIP_LOCATION)

and now getting the below error at my find_file line.

ValueError: Invalid file path or buffer object type: <class 'generator'>

then I alter with the generator obj as suggested in comments; i.e.:

generator = self.find_file(); excel_data = pandas.read_excel(generator())

and now getting this error:

    generator = self.find_file(); excel_data = pandas.read_excel(generator())
TypeError: 'generator' object is not callable

Here is my /main.py if helpful

"""Start Point"""
from data.find_pending_records import FindPendingRecords
from vital.vital_entry import VitalEntry
import sys
import os
import config
import datetime
# from csv import DictWriter

if __name__ == "__main__":
    try:
        for file in os.listdir(config.DIRECTORY_LOCATION):
            if 'VCCS' in file:
                PENDING_RECORDS = FindPendingRecords().get_excel_data()
                # Do operations on PENDING_RECORDS
                # Reads excel to map data from excel to vital
                MAP_DATA = FindPendingRecords().get_mapping_data()
                # Configures Driver
                VITAL_ENTRY = VitalEntry()
                # Start chrome and navigate to vital website
                VITAL_ENTRY.instantiate_chrome()
                # Begin processing Records
                VITAL_ENTRY.process_records(PENDING_RECORDS, MAP_DATA)
    except:
        print("exception occured")
        raise
Dr Upvote
  • 8,023
  • 24
  • 91
  • 204
  • 2
    How do your run `find_file` ? I don't like `return` in your code - probably it exits after first `.zip` and probably you run `find_file` again so it starts it at first file again and `return` exits again. Function will not remeber what it was doing when you run it before. You would have to use `yield` instead of `return`. – furas Apr 03 '19 at 19:55
  • 1
    @furas is right. You are likely returning `archive.extract` after you successfully extract the first `current_file`. So yoy never reach the second iteration of the external `for` loop. – Valentino Apr 03 '19 at 19:59
  • thanks guys, let me try this.. have been stuck on this one for awhile. tried os, glob, everything under the sun to achieve this – Dr Upvote Apr 03 '19 at 20:00
  • With yield it's spitting out: ValueError: Invalid file path or buffer object type: – Dr Upvote Apr 03 '19 at 20:15
  • 1
    try `generator = self.find_file(); excel_data = pandas.read_excel(generator())`. And remember to create `generator` only once and use it many times. If you create `generator` again then it starts at first file. – furas Apr 03 '19 at 20:16
  • Thanks.. looks like it's spitting this out now lol: 'generator = self.find_file(); excel_data = pandas.read_excel(generator()) TypeError: 'generator' object is not callable' – Dr Upvote Apr 03 '19 at 20:19
  • 1
    What do you expect to have in `excel_data`? If I understand correclty, you are trying to read several zipped excel files, am I right? – Valentino Apr 03 '19 at 20:42
  • Yeah, exactly - just one each time the script is executed. I have a for loop around my main.py function for this - right now the script is running 4 times as it's supposed to be per my directory; however it is just unzipping and iterating over the same one excel file - not others. I've isolated the problem to this area. – Dr Upvote Apr 03 '19 at 20:47

1 Answers1

1

It is not tested.

def find_file(cls):
    listOfFiles = os.listdir(config.DIRECTORY_LOCATION)  
    total_files = 0
    for entry in listOfFiles:
        total_files += 1  
        # if fnmatch.fnmatch(entry, pattern):
        current_file = entry
        print (current_file)

        """"Finds the excel file to process"""
        archive = ZipFile(config.DIRECTORY_LOCATION + "/" + current_file)
        for file in archive.filelist:
            if file.filename.__contains__('Contact Frog'):
                yield archive.extract(file.filename, config.UNZIP_LOCATION)

This is just your function rewritten with yield instead of return.

I think it should be used in the following way:

for extracted_archive in self.find_file():
    excel_data = pandas.read_excel(extracted_archive)
    #do whatever you want to do with excel_data here

self.find_file() is a generator, should be used like an iterator (read this answer for more details).

Try to integrate the previous loop in your main script. Each iteration of the loop, it will read a different file in excel_data, so in the body of the loop you should also do whatever you need to do with the data.

Not sure what you mean by:

just one each time the script is executed

Even with yield, if you execute the script multiple times, you will always start from the beginning (and always get the first file). You should read all of the files in the same execution.

Valentino
  • 7,291
  • 6
  • 18
  • 34
  • Thanks for attempting; i get no errors with this... but it is still same result; only touching the one file each time..Upvote for the effort tho, thanks :'( – Dr Upvote Apr 04 '19 at 16:53
  • I still don't get one thing: do you really want to extract one excel file each time you execute he script? – Valentino Apr 04 '19 at 17:57
  • Yes. I have folder full of 'zips'. I am iterating through this folder in main.py > I want to unzip and use that zips content (excel file) one at a time (each time script is ran) - move it to different folder after (UNZIP_LOCATION). I had this functionality working when I hardcoded the full .zip file name - in i.e. archive = ZipFile(EXACTFILE) - but that defeats the entire purpose – Dr Upvote Apr 04 '19 at 18:03
  • Now I am really curious to see the full code. If you run your script many times, how do you keep track of which file should be extracted the next time? Even if you hardcode the file names, it should not work (unless you hardcode one file name and edit it by hand each time). – Valentino Apr 04 '19 at 18:09
  • I just added my main.py -- if you look at: def find_file(cls): - everything before the 'Find file comment' - if that is removed and absolute file path is in the archive = ZipFile - yes it works on one file. That is problem... I want to iterate through all.... – Dr Upvote Apr 04 '19 at 18:13
  • I don't see where you call `find_file()` in the main. However, if you need to extract only one excel each time you run the script, the only way is to write the nextfile name into a text file, and read it when you run the script. There is no way a program remember it's variables after it's terminated. That, or maybe check if a zip file has been extracted already. – Valentino Apr 04 '19 at 18:40
  • Or just delete each zip as it's processed? – Dr Upvote Apr 04 '19 at 18:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/191277/discussion-between-valentino-and-peter-gibbons). – Valentino Apr 04 '19 at 19:06