0

I am trying to search a given directory for a specific file, and if that file does not exist I would want the code to say "File does not exist". Currently with os.walk I can get this to work, however this will hit on every single file that isn't the specified file and print "File dos not exist". I know that this is how os.walk functions, but I was not sure if there is a way to make it only print out once if it is found or not found.

Folder structure:

root folder| |Project Folder |file.xml |other files/subfolders

How I would want the code to work is to go inside of "Project Folder", do a recursive search for "file.xml", and once it is found print out once "Found", otherwise prints out once "Not found".

The code is:

def check_file(x): #x = root folder dir
   for d in next(os.walk(x))[1]: #if I understand correctly, [1] will be Project Folder
        for root, directories, files in os.walk(x):
            for name in files:
                if "file.xml" not in name:
                    print("found")
                else:
                    print("File Missing")

If I change the code to

            for name in files:
                if "file.xml" in name:
                    print("found")
                else:
                    pass

The code technically works as intended, but it doesn't really do much to help point out if it isn't there, so this isn't a good solution. It would be easier if I was able to give the code a specific path to look in, however as the user is able to place the 'root folder' anywhere on their machine as well as the 'project folder' would have different names depending on the project, I don't think I would be able to give the code a specific location.

Is there a way to get this to work with os.walk, or would another method work best?

Avila
  • 31
  • 6
  • 3
    `glob.glob` should help here: https://docs.python.org/3/library/glob.html#glob.glob – slothrop Jun 01 '23 at 14:55
  • 1
    Set a flag and break the loop when you find what you're looking for, check whether the flag is set after the loop exits. As you've discovered, it does no good to report when an individual file doesn't match your target; you need to check for the lack of any files that _do_ match your target, and you can't determine that until you've gone through the whole set. – Charles Duffy Jun 01 '23 at 15:17
  • @slothrop I can see where the logic is going while reading the page, but I'm not sure how I would incorporate it fully. Would I also need to include a if/for logic to print out the results? As well would it accept the (x) as the path automatically and know to search for 'file.xml' within the glob bracket? – Avila Jun 01 '23 at 16:21
  • 1
    @Avila I made an answer to demonstrate these – slothrop Jun 01 '23 at 16:29
  • @Avila have you considered os.scandir() or os.listdir()? – Vincent Laufer Jun 01 '23 at 16:52

3 Answers3

3

The glob module is very convenient for this kind of wildcard-based recursive search. Particularly, the ** wildcard matches a directory tree of arbitrary depth, so you can find a file anywhere in the descendants of your root directory.

For example:

import glob

def check_file(x):  # where x is the root directory for the search
    files = glob.glob('**/file.xml', root_dir=x, recursive=True)
    if files:
        print(f"Found {len(files)} matching files")
    else:
        print("Did not find a matching file")
slothrop
  • 3,218
  • 1
  • 18
  • 11
  • This is great, thanks! :) If I ever needed to glob another file, would I be able to add a tuple to the glob, or would I need to do another set of glob for the other file(s)? – Avila Jun 01 '23 at 18:18
  • 1
    You would do another call to `glob.glob`. So you might want to make a tuple of your filenames, and do a `for` loop over that tuple, calling `glob` inside the loop. – slothrop Jun 01 '23 at 18:25
2

Listing [Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False).

You don't need 2 nested loops. You only need to check on each iteration, if the base file name is present in the 3rd member that os.walk produces.
This implementation handles the case of a file being present in multiple directories. If you only need print the file once (no matter how many times it's present in the directory), there's the function search_file_once.

code00.py:

#!/usr/bin/env python

import os
import sys


def search_file(root_dir, base_name):
    found = 0
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            found += 1
    if not found:
        print("Not found")


# @TODO - cfati: Only care if file is found once
def search_file_once(root_dir, base_name):
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            break
    else:
        print("Not found")


def main(*argv):
    root = os.path.dirname(os.path.abspath(__file__))
    files = (
        "once.xml",
        "multiple.xml",
        "notpresent.xml",
    )
    for file in files:
        print("\nSearching recursively for {:s} in {:s}".format(file, root))
        search_file(root, file)


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###

[prompt]> tree /a /f
Folder PATH listing for volume SSD0-WORK
Volume serial number is AE9E-72AC
E:.
|   code00.py
|
\---dir0
    +---dir00
    +---dir01
    |       multiple.xml
    |       once.xml
    |
    \---dir02
        \---dir020
                multiple.xml


[prompt]>
[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe" ./code00.py
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32


Searching recursively for once.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml

Searching recursively for multiple.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml

Searching recursively for notpresent.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Not found

Done.

This is just one of the multiple ways possible of doing this. Check [SO]: How do I list all files of a directory? (@CristiFati's answer) for more details.

CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • Hey, thanks for the answer! I thiink this is what I need to do. I added it into my code and almost got there. I only need to verify that it exist once(as there it will only be included one), but I'm having issues with the break logic in the for loop. I have it as if base_name in files: print("Found.."..))) break. This breaks the inner loop I think, but still loops around over and over. Adding a break outside of this for the foor loop also doesn't seem to work. Am I misusing break? Sorry if this is confusing, the limited character space makes it a little hard to fully type out – Avila Jun 01 '23 at 16:25
  • 1
    I added *search\_file\_once* that should do exactly what you're after. – CristiFati Jun 01 '23 at 16:34
1

I have written a function like this and several others in the past. Want to provide them all for context, some will work for your case with minimal to no modifcation.

## Find ALL matches (not just one):
## Example Usage:  findAll('*.txt', '/path/to/dir')

def findAll(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
            return result

## A function that keeps going until all target files are found)
def findProjectFiles(Folder, targetFiles):
    import os
    os.chdir(Folder)
    filesFound=[]
    while len(targetFiles) > len(filesFound):
        for root, dirs, files in os.walk(Folder):
            for f in files:
                current=os.path.join(Folder, f)
                if f in TargetFiles:
                    filesFound.append(f)
            for d in dirs:
                Folder=os.path.join(Folder, d)
            break;
    filePaths=os.path.abspath(filePaths)
    return filePaths

# find all file paths in folder:

def findPaths(name, path):
    import os
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

## can search the object returned for the string you want to find easily

## Similar, but this will match a pattern (i.e. does not have to be exact file name match).

import os, fnmatch
def findMatch(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(os.path.join(root, name))
                return result
Vincent Laufer
  • 705
  • 10
  • 26