1

I'm a Python beginner and looking for some help with searching a list of directories using os.walk.

The idea is that I'm pulling a list of directories from an SQL database, these directories will likely have different drive letters or even a UNC path. What I need to do is search through those directories to find a file of a specific name and delete it. As the file could be in any of the directories it needs to search them all. The list of directories is indefinite so my thought was to store them into a list and have os.walk look at all directories in that list.

def get_location():
    c.execute('SELECT ADDRESS FROM DIRECTORY')
    data = c.fetchall()
    SQLlist = [row for row in data]
    return SQLlist


addresslist = get_location()


def FileDeleter():
    for root, dirs, files in chain.from_iterable(os.walk(addresslist[0:], topdown=False) for path in (str(addresslist[0:]))):
        for file in files:
            if correctID in file:
                if file.endswith('.custextn'):
                    os.remove(os.path.join(root, file))

This is how the code currently stands, but previously I've tried:

    for root, dirs, files in os.walk(addresslist[0:], topdown=False):

    for root, dirs, files in chain.from_iterable(os.walk(addresslist[0:], topdown=False)):

It seems to be that os.walk doesn't accept lists (/ tuples). If I set addresslist[0] or addresslist[1] it actually works, however as I don't know how many addresses there potentially could be I unfortunately can't just store X addresses as separate variables and duplicate the function.

The error I get when running my code is:

'TypeError: expected str, bytes or os.PathLike object, not list'

Finally, I've tested with a hardcoded list of addresses just to rule out an issue with how the list is being extracted from the database, e.g.:

addresslist = ['C:\\Subfolder1\\Subfolder2', 'D:\\Subfolder1\\Subfolder2']

and, because of unpacking errors:

x,y = ['C:\\Subfolder1\\Subfolder2', 'D:\\Subfolder1\\Subfolder2']

Thanks

Michael Butscher
  • 10,028
  • 4
  • 24
  • 25
Meshi
  • 470
  • 4
  • 16
  • 1
    Wrap it in another for-loop to process the items of the `adresslist` one by one. Maybe move it to an additional function to avoid too deep nesting of loops. – Michael Butscher Dec 23 '17 at 22:45
  • Thank you, I understanding about nesting so I have tried to create a loop outside which prints both locations just fine, however when I set it to return and print the variable I only get the first entry of the list. Replacing this new variable in the code where 'addresslist[0:] was doesn't appear to remove anything – Meshi Dec 24 '17 at 00:04

2 Answers2

1

Your first for loop doesn't do what you want it to. It's close, but not quite.

for root, dirs, files in chain.from_iterable(os.walk(addresslist[0:], topdown=False) for path in (str(addresslist[0:])))

What your loop is currently doing is converting your addresslist into a string. Then you are actually iterating over each character in that string which is put into the path variable. Then you are trying to chain a series of os.walk generators. But os.walk needs a single path. You also aren't using that path variable anywhere else in your code.

This should be:

for path in addresslist:
   # it looks like you are worried that not all paths will be strings
   # if that's really a concern, then leave this next line.
   # Otherwise, I think it is safe to delete it
   path = str(path) 
   for root, dirs, files in os.walk(path, topdown=False):

That will take each element from addresslist (which is the path you want to search) and do an os.walk over it. I don't think you need to be using chain here at all.

If you want to use chain (which isn't necessary) you can follow the outline provided by this SO post: os.walk multiple directories at once.

for root, dirs, files in chain.from_iterable(os.walk(str(path)) for path in addresslist):

One more thing that you should do is have addresslist be a parameter that is passed into your function.

def FileDeleter(addresslist):
   # your function code here
# then you need to actually call the function
addresses = get_locations()
FileDeleter(addresses)

Relying on global variables can get you in a lot of trouble as your code becomes more complex.

TheF1rstPancake
  • 2,318
  • 17
  • 17
  • Hi, thanks for your comments. I've tried both suggestions but still not having any luck. Using this method (without path = str(path)) I get the error 'TypeError: expected str, bytes or os.PathLike object, not tuple', when converting to a string it doesn't action anything. – Meshi Dec 24 '17 at 00:00
  • @Meshi A `print(repr(path))` as first line in the for-loop shows what `path` really is. – Michael Butscher Dec 24 '17 at 01:42
  • Agree with @MichaelButscher here. Also, are you sure the data returned from `get_locations` is actually a list of paths? – TheF1rstPancake Dec 24 '17 at 02:20
  • @MichaelButscher so if I just print addresslist I get ('C:\\Subfolder1\\Subfolder2',) ('D:\\Subfolder1\\Subfolder2',) which I believe is a list of paths. If I set the function to return 'path' and then assign a variable to print e.g. print(addresslist2) it only prints the first entry. In response to the other question, I believe these must be a list of paths as they work fine when using os.walk(addresslist[0]) to just select the first entry. Confusing why – Meshi Dec 24 '17 at 10:07
  • With a bit more testing I think I've found the issue. The list itself when printed seemed to be wrapped in an extra set of either [ ] or ( ). I got the system to print the list with just one set of [ ] e.g. ['address1', ['address2'] and this now works. The problem now is I don't know how to get my SQL extracted list to not have the additional [ ], any ideas? Thanks for your help so far, adding in the for loop for the list worked! – Meshi Dec 24 '17 at 11:20
  • Sqlite3 returns a list of tuples when you fetch data. You can change your `get_locations` function to be `SQLlist = [row[0] for row in data]` if you know that each row you get back will only have one piece of data in it. – TheF1rstPancake Dec 24 '17 at 18:27
0

I've got this working now and wanted to confirm what I did.

There were two issues. I needed the additional for loop suggested by @TheF1rstPancake and @Michael Butscher.

The second problem was extracting the list of directories from the database.

def get_location():
    c.execute('SELECT ADDRESS FROM DIRECTORY')
    data = c.fetchall()
    SQLlist = [row for row in data]
    return SQLlist

I was using the above but found when you print(data) you got a tuple of tuples or list of tuples which it was failing to loop through for os.walk to use. The result looked like

[('C:\\Subfolder1\\Subfolder2',), ('D:\\Subfolder1\\Subfolder2',)]

The solution I used is below

def get_location():
    c.execute('SELECT ADDRESS FROM DIRECTORY')
    data = c.fetchall()
    SQLlist = []
    for row in range(len(data)):
        SQLlist.append(data[row][0])
    return SQLlist

This now gives me the list:

['C:\\Subfolder1\\Subfolder2', 'D:\\Subfolder1\\Subfolder2']

When running this list through the additional for loop os.walk now correctly searches all the directories.

Thanks for everyone's help, really appreciate this!

Meshi
  • 470
  • 4
  • 16