0

I have a directory structure that resembles the following:

Dir1
Dir2
Dir3
Dir4
    L SubDir4.1
    L SubDir4.2
    L SubDir4.3

I want to generate a list of files (with full paths) that include all the contents of Dirs1-3, but only SubDir4.2 inside Dir4. The code I have so far is

import fnmatch
import os

for root, dirs, files in os.walk( '.' )
    if 'Dir4' in dirs:
        if not 'SubDir4.2' in 'Dir4':
            dirs.remove( 'Dir4' )
    for file in files
        print os.path.join( root, file )

My problem is that the part where I attempt to exclude any file that does not have SubDir4.2 in it's path is excluding everything in Dir4, including the things I would like to remain. How should I amend that above to to do what I desire?

Update 1: I should add that there are a lot of directories below Dir4 so manually listing them in an excludes list isn't a practical option. I'd like to be able to specify SubDur4.2 as the only subdirectory within Dir4 to be read.

Update 2: For reason outside of my control, I only have access to Python version 2.4.3.

  • I'm confused. I could have read that wrong, but you say that you only want SubDir4.2 within Dir4, then you say that the code is excluding things in Dir4 that you want. Are there things in Dir4 you want other than the contents of SubDir4.2? – JerseyMike Jul 16 '12 at 12:19
  • Sorry for the confusion. I would like to exclude everything in `Dir4` **except** `SubDir4.2`, but the code I have written is excluding everything in `Dir4` including `SubDir4.2`, and I would like to know how to fix it so that it does the former. –  Jul 16 '12 at 12:30
  • No problem. Just wanted to make sure I understood. I submitted a solution that matches what you were trying to accomplish. My brain hasn't embraced the "Pythonic Way" yet, so MarcO's solution is hard for me to read, but I like it. :) – JerseyMike Jul 16 '12 at 13:49

3 Answers3

1

There are a few typos in your snippet. I propose this:

import os

def any_p(iterable):
    for element in iterable:
        if element:
            return True
    return False

include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2', 'Dir3', 'Dir2'] # List all your included folder names in that


for root, dirs, files in os.walk( '.' ):
    dirs[:] = [d for d in dirs if any_p(d in os.path.join(root, q_inc) for q_inc in include_dirs)]

    for file in files:
        print file

EDIT: According to comments, I have changed that so this is include list, instead of an exclude one.

EDIT2: Added a any_p (any() equivalent function for python version < 2.5)

EDIT3bis: if you have other subfolders with the same name 'SubDir4.2' in other folders, you can use the following to specify the location:

include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2']

Assuming you have a Dir1/SubDir4.2.

If they are a lot of those, then you may want to refine this approach with fnmatch, or probably a regex query.

Marc-Olivier Titeux
  • 1,209
  • 3
  • 13
  • 24
  • Thanks a lot for the reply, but I failed to mention in my question that there were a lot of subdirectories below `Dir4`, so listing them all off manually isn't a practical solution. –  Jul 16 '12 at 12:16
  • Thanks again, but it turns out I only have access to python 2.4.3 (work computer) and `any()` wasn't introduced until 2.5. –  Jul 16 '12 at 12:37
  • No problem! This is not as flexible you may like if your pattern diverges more than that though... – Marc-Olivier Titeux Jul 16 '12 at 13:03
  • In my test, this is removing ALL directories including Dir1, Dir2 and Dir3. – JerseyMike Jul 16 '12 at 13:58
  • I also don't think the `include_dirs = ['Dir4/SubDir4.2']` will work since the `dirs` list is only one level deep, the list of directories directly under the current `root`. – JerseyMike Jul 16 '12 at 14:05
  • You're right, I was focusing on extracting the Subdir only. One fast "patch" would be to add Dir1, Dir2 and Dir3 to include_dirs (not good if they're many). I'll try to find a better way – Marc-Olivier Titeux Jul 16 '12 at 14:07
  • @JerseyMike I have added the edit for `include_dirs = ['Dir4/SubDir4.2']`. Although, I could not come up with something "nicer" as for the include of Dir1, Dir2, Dir3. If you have something better, help yourself! ;) Thanks for the testing. – Marc-Olivier Titeux Jul 16 '12 at 14:35
  • @MarcO, I really like your use of comprehensions (even though I'm not good with them yet). I'm just not sure they will work with all of the exceptions in this question. I played with your solution and couldn't get it to do everything. I was able to put together something that worked, but it's much more old-school not nearly as elegant as yours. – JerseyMike Jul 16 '12 at 17:46
  • Your solution is probably more adapted to the problem at hands. Even though it has a more "direct" approach, it does the work and it is understandable by whoever reads it. – Marc-Olivier Titeux Jul 17 '12 at 08:18
0

I altered mstud's solution to give you what you are looking for:

import os;

for root, dirs, files in os.walk('.'):
    # Split the root into its path parts
    tmp = root.split(os.path.sep)
    # If the lenth of the path is long enough to be your path AND
    # The second to last part of the path is Dir4 AND
    # The last part of the path is SubDir4.2 THEN
    # Stop processing this pass.
    if (len(tmp) > 2) and (tmp[-2] == 'Dir4') and (tmp[-1] != 'SubDir4.2'):
        continue
    # If we aren't in Dir4, print the file paths.
    if tmp[-1] != 'Dir4':
        for file in files:
            print os.path.join(root, file)

In short, the first "if" skips the printing of any directory contents under Dir4 that aren't SubDir4.2. The second "if" skips the printing of the contents of the Dir4 directory.

JerseyMike
  • 849
  • 7
  • 22
-1
for root, dirs, files in os.walk('.'):
    tmp = root.split(os.path.sep)
    if len(tmp)>2 and tmp[-2]=="Dir4" and tmp[-1]=="SubDir4.2":
        continue

    for file in files:
        print os.path.join(root, file)
mstud
  • 413
  • 1
  • 4
  • 13