I have a function which I want to enumerate through all files and folders from a target folder. When/if it finds rar files I want it to extract them and then delete them. In the case of multi-part archives it will also check for and delete the remaining files (which have already been extracted with the first volume).
I was using os.listdir in a for loop, but the problem with this approach is: a) I don't think it will handle subfolders without writing a recursion loop for them (which I don't want to do because recursion hurts my head). b) because the for loop creates its dictionary(?) of items only at the beginning, when it loops to a file name that has already been removed in a prior iteration I will get a failure to find the file.
It appears os.walk may be better for "a)" above, and my research so far shows that I should be able to update the os.walk in realtime on each iteration. However I can't figure out how to do this.
I've got something like this:
for root, dirs, files in os.walk('d:\\test'):
for file in files:
print 'files (before remove): ', file, files
# This is where I would do some operation that deletes one or more files.
files.remove(file)
print 'files (after remove): ', file, files
However the output is like this:
D:\test>d:\Python27\python.exe d:\file.py
files (before remove): Crystal.part01.rar ['Crystal.part01.rar', 'Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove): Crystal.part01.rar ['Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (before remove): Crystal.part03.rar ['Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove): Crystal.part03.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (before remove): Crystal.part05.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove): Crystal.part05.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part06.rar']
I think this makes sense...we can see the list getting updated, however because I am already stuck in the (second) For statement that has created a list of the files it continues to try to loop through the original list order which is now offset by one, creating a "skip" effect.
How can I achieve operating on each file in the directory, except letting the calling loop know to skip an item that has been removed?
Update - I may be incorrect in assuming this can be done. What gave me this idea was this snipped from the python docs:
When topdown is
True
, the caller can modify the dirnames list in-place (perhaps usingdel
or slice assignment), andwalk()
will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to informwalk()
about directories the caller creates or renames before it resumeswalk()
again. Modifying dirnames when topdown isFalse
has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
On reading it again I see it only mentions dirnames and not filename - so while I still don't understand the exact method to accomplish this, it looks like you may only be able to manipulate the dirnames in place.