-1

I would need to exclude a few directories or only scan some of them while using os.walk(). I am trying to get the most recent files. I learned how to do this from this post but it only return back one file. For my project I would need a list of 5 or more recent files. From this post it shows on how to scan a few dirs only but I have no idea on how to implement it in the first post answer.

I want to exclude the directory which is the recently modified file. If Folder 3 is the recently modified file then the next time i scan looking for the 2 or 3 or other i want to exclude that directory.

Here is my file layout:

MainFile(CurrentOne)
|
|-- Projects(the one I am scanning)
    #the following folders all have images in them but they are created at the same time as the folder
    |-- Folder 1
    |
    |-- Folder 2  
    |
    |-- Folder 3
    |
    |-- etc...

My previous approach was:

I cant show the code as I have deleted that piece of code but I can explain it:

First: I would first get a list of the dirs in the folder using os.listdir(Projects)

Second: I would check to see if I have more than 5 or less than or equal to 5

Third: I would go into each folder(I had them put in a list in the first operation) and use stats = os.stat(dirname) to get info about it.

Fourth: I put all of the info in a list using recent.insert(0, stats[8])

Lastly: I would compare all the times and get 5 of them but they are all incorrect.

Edit

Once I get the most recently modified file I would want to exclude that directory from being scanned or only scan the other directories. For example pretend folder 1 was recently modified and python displayed folder 1. I then would want to exclude that directory while scanning for the second recently modified directory

Dodu
  • 109
  • 2
  • 8
  • Your title doesn't match the question description. What are the directories you want to exclude? – Barmar May 25 '22 at 17:37
  • If you have a list of directories you want to exclude, then something like `for root, dirs, files in os.walk(...): if root in list_of_dirs_to_exclude: continue` and then proceed after the `if` to process all the others. – tripleee May 25 '22 at 17:48
  • I have edited the question @Barmar to make my question more clear – Dodu May 26 '22 at 03:21
  • @tripleee thanks for the help I have posted an answer for others if needed – Dodu May 26 '22 at 03:25

1 Answers1

0

After reading @tripleee is comment I have made this piece of code that gets most recently modified files.

import os

os.chdir('Folder')
projloc = os.getcwd() #getting the folder to scan

list_of_dirs_to_exclude = []

def get_recent_files():
    max_mtime = 0
    
    for root, dirs, files in os.walk(projloc):
        if root not in list_of_dirs_to_exclude: # I have made a change by adding the `not` in unlike @tripleee's answer
            for fname in files:
                full_path = os.path.join(root, fname)
                mtime = os.stat(full_path).st_mtime
                if mtime > max_mtime:
                    max_mtime = mtime
                    max_dir = root
                    max_file = fname

    list_of_dirs_to_exclude.insert(0, max_dir)
    print(max_file)

    if len(list_of_dirs_to_exclude) == 5: #You can keep whatever number you want such as 6, 7, 4 etc...
        pass

    else:
        get_recent_files()

get_recent_files()

Here is updated code if you want the code all in the same def

def get_recent_files():
    list_of_dirs_to_exclude = []
    list_of_dirs = []
    max_mtime = 0

    for dirs in os.listdir(projloc): #projloc is predefined for me. I got it using the same method in the above code
        list_of_dirs.insert(0, dirs)

    while len(list_of_dirs) != 5: 
        for root, dirs, files in os.walk(projloc):
            if root not in list_of_dirs_to_exclude:
                for fname in files:
                    full_path = os.path.join(root, fname)
                    mtime = os.stat(full_path).st_mtime
                    if mtime > max_mtime:
                        max_mtime = mtime
                        max_dir = root
                        max_file = fname

        list_of_dirs_to_exclude.insert(0, max_dir)
        print(max_file)
        max_mtime = 0

        if len(list_of_dirs_to_exclude) == 5:
            break
Dodu
  • 109
  • 2
  • 8
  • There is no need to `os.chdir` (twice!) or to get `os.getcwd`. See [What exactly is current working directory?](https://stackoverflow.com/questions/45591428/what-exactly-is-current-working-directory/66860904) – tripleee May 26 '22 at 07:05
  • 1
    This seems inherently flawed, you need to identify the directories you want to exclude before you can exclude them. Scanning the entire file tree five times recursively is inefficient and comoicated. And this just gets the recent directories, and doesn't show how to actually do something with the rest, which is what your question seems to be asking. – tripleee May 26 '22 at 07:11
  • For the first I have adapted that to my code I have all my folders predefined. I'll edit the answer to make it better. For the second on got the code which checks all the files for the most recent one from [this post]{https://stackoverflow.com/questions/2731014/finding-most-recently-edited-file-in-python) I have no other good way from other posts which I tried looking for. I have also did experiment but I failed. The speed is a good speed for me – Dodu May 26 '22 at 07:41
  • Another way is to put it inside a Thread, making it run faster – Dodu Jun 26 '22 at 16:14