0

Having a tree structure as follows:

custom_test/
├── 110/
│   ├── 1548785454_CO_[1].txt
├── 120/
│   ├── 1628785454_C4_[1].txt
└── 13031/
│   ├── 1544725454_C2_[1].txt
└── test_results/
│   ├── resulset1.txt
│   ├── hey.txt
script.py <------- this is the script which runs the Python code

I want to get the files and subfolder of all folders except test_results (I want to ingnore this folder). Using the minified example above, my desired output is:

['110\\1548785454_CO_[1].txt', '120\\1628785454_C4_[1].txt', '13031\\1544725454_C2_[1].txt']

This is my try, which makes the output, but it includes also the ones of the test_results folder:

deploy_test_path = "custom_test"
    print([os.path.join(os.path.basename(os.path.relpath(os.path.join(filename, os.pardir))), os.path.basename(filename)) for filename in glob.iglob(deploy_test_path + '**/**', recursive=True) if os.path.isfile(filename)])

Without list comprehension (for easier understanding):

deploy_test_path = "custom_test"
for filename in glob.iglob(deploy_test_path + '**/**', recursive=True):
    if os.path.isfile(filename):
        a = os.path.join(os.path.basename(os.path.relpath(os.path.join(filename, os.pardir))), os.path.basename(filename))
        print(a)

How can I archive my goal? I know I can do it removing the elements of test_results from the array, but is there any more elegant and pythonic wait to do this?

Thanks in advance

Avión
  • 7,963
  • 11
  • 64
  • 105
  • Use `glob.iglob('custom_test/**[!test_results]/**', recursive=True):` This will exclude the `test_result` folder (only for the first level). Also see [this](https://stackoverflow.com/questions/20638040/glob-exclude-pattern) post that shows the exclusion rule and how you can use sets to exclude two different patterns. – Thymen Dec 10 '20 at 10:07

2 Answers2

0

Anytime I need to manipulate paths, I turn to Pathlib.

Here is how I would do it, more or less:

from pathlib import Path

dir = Path("custom_test")
files = dir.rglob("*")
res = [f.relative_to(dir) for f in files if not f.match("test_results/*")]

In a one-liner:

from pathlib import Path

res = [f.relative_to("custom_test") for f in Path("custom_test").rglob("*") if not f.match("test_results/*")]

If you only need the files, you can use rglob("*.*") instead, or

dir = Path("custom_test")
res = [f.relative_to(dir) for f in dir.rglob("*") if not f.match("test_results/*") and f.is_file()]
  • This includes also the directories. I just want to get the files. – Avión Dec 10 '20 at 10:13
  • And it also includes the `custom_test` on the output. Please, see the desired output of the post. – Avión Dec 10 '20 at 10:16
  • Please, check my second comment! – Avión Dec 10 '20 at 10:20
  • Ok check my second edit then ;-) And feel free to read the Pathlib doc, it's very powerful! – François Degrave Dec 10 '20 at 10:26
  • I still don't get the proper output with your code. It outputs the folders. This one works: `res = [str(f.relative_to(*f.parts[:1])) for f in Path("custom_test").rglob("*") if not f.match("test_results/*") and f.is_file()] ` – Avión Dec 10 '20 at 10:31
  • Indeed, in my answer I don't transform the `Path` objects into strings, because it is usually a good idea to keep working with such objects rather than strings all along your code. It is much easier to manipulate for file related operations.The only reason you would need to turn them into strings is to use as output or pass to modules that are not pathlib-compatible (core-modules of Python are, nowadays). And my answer does not output the folders, the `.is_file()` part takes care of it. – François Degrave Dec 10 '20 at 10:45
-1

I had the same situation and did the following:

import os

IGNORE_FOLDERS = ("test_results",".git")` #as many folders as you need to ignore


    def get_data():
        root, dirnames, filenames = next(os.walk(file_path))
        for dirname in (d for d in dirnames if d not in IGNORE_FOLDERS):
            print(filenames) # or save them to a variable if you like

    
Nick
  • 3,454
  • 6
  • 33
  • 56