1

I've created a list that contains file paths to files that I want to delete. What's the most Pythonic way to search through a folder, and it's sub folders for these files, then delete them?

Currently I'm looping through the list of file paths, then walking through a directory and comparing the files in the directory to the file that is in the list. There has to be a better way.

for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)

    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)

                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    #os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(day_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

So to clarify, what I'm doing now is looping through the entire list of features to delete, every iteration I'm also looping through every single file in the directory and all sub directories and comparing that file to the file that is currently looping in the features to delete list. It takes a very long time and seems like a terrible way to go about doing it.

franchyze923
  • 1,060
  • 2
  • 12
  • 38
  • Look into https://docs.python.org/2/library/glob.html – Sash Sinha Dec 20 '16 at 00:09
  • Unfortunately I'm actually using 2.7. I'm using this with some GIS functions that only support 2.7 – franchyze923 Dec 20 '16 at 00:17
  • 1
    his link is for python 2? I don't see the issue. – deweyredman Dec 20 '16 at 00:21
  • From py 3.5 onwards `glob` gained recursive support, which would have simplified this code. see [here](http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python). With py 2 its never going to be radically different from what the OP has already posted. – Paul Rooney Dec 20 '16 at 00:25
  • I am confused about a 'feature' is it a path to a directory like "C:\home\" And how do you get the filename(s) to delete? Is it like "C:\home\*.jpg" And since "folder" is not set in the code you show, what is it? – Marichyasana Dec 20 '16 at 00:27
  • This will locate all files with the given extensions in the current working directory and all subdirectories: `dir *.cpp *.h *.java /b/s` Maybe you can use that instead of walk. – Marichyasana Dec 20 '16 at 00:30
  • Marichyasana, yes a feature is a path to a directory like C:\home\1.jpg. I get the filenames to delete earlier in the script. Folder, is just a directory on my computer containing folders and files. – franchyze923 Dec 20 '16 at 00:33

3 Answers3

0

You should only visit each directory once. You can use sets to compare the list of file names in a given directory to your delete list. The list of contained and not-contained files become simple one-step operations. If you don't care about printing out the file names, its rather compact:

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    for delete_name in delete_set.intersection(files):
        os.remove(os.path.join(root, delete_name))

But if you want to print as you go, you have to add a few intermediate variables

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    files = set(files)
    delete_these = delete_set & files
    keep_these = files - delete_set
    print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)
    print 'delete these: {}'.format('\n '.join(delete_these))
    print 'keep these: {}'.format('\n '.join(keep_these))
    for delete_name in delete_these:
        os.remove(os.path.join(root, delete_name))
tdelaney
  • 73,364
  • 6
  • 83
  • 116
0

Create a function to separate the recursive glob like functionality from your own deletion logic. Then just iterate over the list and delete any that match your blacklist.

You can make a set to give improved performance matching the file names. The larger the list the greater the improvement, but for smaller lists it might be negligible.

from fnmatch import fnmatch
import os
from os import path

def globber(rootpath, wildcard):
    for root, dirs, files in os.walk(rootpath):
        for file in files:
            if fnmatch(file, wildcard):
                yield path.join(root, file)

features_to_delete = ['blah', 'oh', 'xyz']

todelete = {'%s.jpg' % x for x in features_to_delete}

print(todelete)
for f in globber('/home/prooney', "*.jpg"):
    if f in todelete:
        print('deleting file: %s' % f)
        os.remove(f)
Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
0

Please look if this code helps you. I included a timer that compares the time of the two different approaches.

import os
from timeit import default_timer as timer

features_to_delete = ['a','b','c']
start = timer()
for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)
    folder = '.'
    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)
                counter = 0
                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(local_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

end = timer()
print(end - start)

start = timer()
features_to_delete = ['d','e','f']
matches = []
folder = '.'
for x in features_to_delete:
    x = str(x) + '.jpg'
features_to_delete = [e + '.jpg' for e in features_to_delete]
print 'features' + str(features_to_delete)
for root, dirnames, filenames in os.walk(folder):
    for filename in set(filenames).intersection(features_to_delete):#fnmatch.filter(filenames, features_to_delete)# fnmatch.filter(filenames, features_to_delete):
        local_folder = os.path.join(folder, root)
        os.remove(os.path.join(local_folder, filename))
        print 'Removed {} \n'.format(os.path.join(local_folder, filename))
end = timer()
print(end - start)

Test

$ touch foo/bar/d.jpg
$ touch foo/bar/b.jpg
$ python deletefiles.py 
this is name checker a.jpg
This is the root directory at the moment:. The following are files inside of it
This is the root directory at the moment:./.idea The following are files inside of it
This is the root directory at the moment:./foo The following are files inside of it
This is the root directory at the moment:./foo/bar The following are files inside of it
Here is name of file d.jpg
Here is name of name checker a.jpg
This file can stay d.jpg 

Here is name of file b.jpg
Here is name of name checker a.jpg
This file can stay b.jpg 

this is name checker b.jpg
This is the root directory at the moment:. The following are files inside of it
This is the root directory at the moment:./.idea The following are files inside of it
This is the root directory at the moment:./foo The following are files inside of it
This is the root directory at the moment:./foo/bar The following are files inside of it
Here is name of file d.jpg
Here is name of name checker b.jpg
This file can stay d.jpg 

Here is name of file b.jpg
Here is name of name checker b.jpg
b.jpg needs to be deleted..
Removed ././foo/bar/b.jpg 

this is name checker c.jpg
This is the root directory at the moment:. The following are files inside of it
This is the root directory at the moment:./.idea The following are files inside of it
This is the root directory at the moment:./foo The following are files inside of it
This is the root directory at the moment:./foo/bar The following are files inside of it
Here is name of file d.jpg
Here is name of name checker c.jpg
This file can stay d.jpg 

0.000916957855225
features['d.jpg', 'e.jpg', 'f.jpg']
Removed ././foo/bar/d.jpg 

0.000241994857788
Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424