os.walk folder exclusion based on .txt file

Question

I would like to have a Folders_To_Skip.txt file with a list of directories separated by new lines

ex:

A:\\stuff\a\b\
A:\\junk\a\b\

I have files which are breaking my .csv record compiling that this is used for and I want to exclude directories which I have no use for reading anyway.

In the locate function I have what I tried to implement from Excluding directories in os.walk but I can't seem to get it to work with directories in a list let alone while reading from a text file list as when I print files accessed it still includes files in the directories I attempted to exclude.

Could you also explain whether the solution would be specific excluded directories (not the end of the world) or if it can be operated to exclude subdirectories (would be more convenient).

Right now the code preceding locate allows for easy lookup of controlling text files and then loading those items in as lists for the rest of the script to run, with the assumption that all control files are in the same location but that location can change based on who is running the script and from where.

Also for testing purposes Drive_Locations.txt is setup as:

A
B

Here is the current script:

import os
from tkinter import filedialog
import fnmatch

input('Press Enter to select any file in writing directory or associated control files...')
fname = filedialog.askopenfilename()
fpath = os.path.split(fname)

# Set location for Drive Locations to scan
Disk_Locations = os.path.join(fpath[0], r'Drive_Locations.txt')
# Set location for Folders to ignore such as program files
Ignore = os.path.join(fpath[0], r'Folders_To_Skip.txt')

# Opens list of Drive Locations to be sampled
with open(Disk_Locations, 'r') as Drives:
    Drive = Drives.readlines()
    Drive = [x.replace('\n', '') for x in Drive]
# Iterable list for directories to be excluded
with open(Ignore, 'r') as SkipF1:
    Skip_Fld = SkipF1.readlines()
    Skip_Fld = [x.replace('\n', '') for x in Skip_Fld]

# Locates file in entire file tree from previously established parent directory.
def locate(pattern, root=os.curdir):
    for path, dirs, files in os.walk(os.path.abspath(root), topdown=True):
        dirs[:] = [d for d in dirs if d not in Skip_Fld]
        for filename in fnmatch.filter(files, pattern):
            yield os.path.join(path, filename)

for disk in Drive:
    # Formats Drive Location for acceptance
    disk = str.upper(disk)
    if str.find(disk, ':') < 0:
        disk = disk + ':'
    # Changes the current disk drive
    if os.path.exists(disk):
        os.chdir(disk)
    # If disk incorrect skip to next disk
    else:
        continue
    for exist_csv in locate('*.csv'):
        # Skip compiled record output files in search
            print(exist_csv)

Hi @tv006, welcome to the site. You've given us *a lot* of code here, and it's not entirely clear what you're asking about. It seems to me like the most obvious place to implement the filtering you seem to want would be in the `locate` function where you're doing the `os.walk` call (it can't work anywhere else), but you don't seem to be attempting to modify `dirs` at all. Can you cut out some of the irrelevant code from your example, and perhaps show us what you've actually attempted, with respect to the directory skipping? — Blckknght, Feb 04 '20 at 19:31
You really need to extract and provide a [mcve]. As a new user, please also take the [tour] and read [ask]. In particular, saying "I tried something vaguely specified" followed by "it didn't work" is both not helpful. Provide facts, not interpretations. — Ulrich Eckhardt, Feb 04 '20 at 19:44
Variable and function names should follow the `lower_case_with_underscores` style. The inconsistent naming and the sheer amount of code make this practically impossible to read. — AMC, Feb 04 '20 at 22:16
I can also see a bunch of areas which could be simplified, notably at the beginning of the program, where you're reading all the files. — AMC, Feb 04 '20 at 22:17
AMC - Unfortunately I can't think of a solution to cleanup the section for reading the files as I'm trying to set this up so coworkers could feasibly run this without any coding just check if they want to change the text files which is more user friendly in my opinion. The most I think that section could benefit from is some way to kill the `tk` window that pops up with the windows explorer search. — tv006, Feb 05 '20 at 15:06
That's not at all related to your actual question, though. Reduce this to a [mre] and maybe ask a new question about `tk`, similarly reduced to the essential code for reasoning about the specific problem you are trying to solve. — tripleee, Feb 05 '20 at 16:49
tripleee - I was simply responding to AMC's comment. This in my opinion is ease of reproducibility any drive letter can be added to `Drive_Locations.txt` file and placed adjacent to `Folders_To_Skip.txt` having some .csv files in the drive letter file tree to search for and adding a directory to `Folders_To_Skip.txt` and see if it disappears from the print readout. — tv006, Feb 05 '20 at 20:34

tripleee · Answer 1 · 2020-02-05T15:04:11.733

1

The central bug here is that os.walk() returns a list of relative directory names. So for example when you are in the directory A:\stuff\a, the directory you want to skip is simply listed as b, not as A:\stuff\a\b; and so of course your skip logic doesn't find anything to remove from the list of subdirectories in the current directory.

Here's a refactoring which examines the current directory directly instead.

for path, dirs, files in os.walk(os.path.abspath(root), topdown=True):
    if path not in Skip_Fld:
        for filename in fnmatch.filter(files, pattern):
            yield os.path.join(path, filename)

The abspath call is important to keep; good on you for including that in your attempt.

Your list of directories to skip should have single backslashes, or perhaps forward slashes, and probably no final directory separator (I fortunately have no way to check how these are reported by os.walk() on Windows).

edited Feb 05 '20 at 15:04

answered Feb 05 '20 at 14:58

tripleee

175,061
34
275
318

I have not examined the rest of the script; there may be additional bugs. – tripleee Feb 05 '20 at 14:59
I tried using it and modified it to use a for loop since I do want an iterable but I it doesn't seem to work. I'm wondering if it might be an issue with how python converts read paths into strings could be rendering the string inoperable. I tried running as many formatting options as I could think of on my paths to exclude but nothing seems to stick. I even considered that it required raising the path up one level but that doesn't seem to work either. – tv006 Feb 05 '20 at 15:57
Rather that guess at the problem, add strategic `print` statements to show the values of your variables, and compare against what you expect. – tripleee Feb 05 '20 at 16:51
That's the thing though, is that based on use of `print` statements a string of a path, when brought in as a list of strings the input =/= output. My assumption of the issue is that I add the equivalent of `path` in a list of strings and in the process of python reading `path` into as a string it becomes `path*` essentially if that makes sense. So therefore `Skip_Fld` will never contain `path` items as `path*` items are what are iterated. The only way I've solved this before was use of raw strings to handle file paths, but I don't know how I could implement that into building a list. – tv006 Feb 05 '20 at 20:16
Again, if the script could be reduced to a [mre] and you could mention and perhaps even demonstrate this observation in your question, you'd be more likely to get help to reach a solution. The pesky GUI in particular obscures the problem because it's harder to know *exactly* how to reproduce your problem. Take out all moving parts so that only the problem remains. – tripleee Feb 06 '20 at 05:25

os.walk folder exclusion based on .txt file

1 Answers1