Run multiple python files in all subdirectories

Question

I have directory containing multiple subdirectories of different scraper. How would you go about writing script that will cd into each of the subdirectories and run the scraper, cd out then continue to the next one what would be the best way to do this if it possible?

Example of the how the directory looks:

- All_Scrapers (parent dir)
   - Scraper_one (sub dir folder)
       - scraper.py
   - Scraper_two (sub dir folder)
       - scraper.py
   - Scraper_three (sub dir folder)
       - scraper.py
   - all.py

all the scrapers have main function

 if __name__ == "__main__":
         main()

Check out this QA for a bit of help: https://stackoverflow.com/questions/1186789/what-is-the-best-way-to-call-a-script-from-another-script — JamesR, Oct 11 '18 at 10:21

norok2 · Accepted Answer · 2018-10-11T14:44:37.903

One way of doing this is to walk through your directories and programmactically import the modules you need.

Assuming that the Scraper X folders are in the same subdirectory scrapers and you have the batch_run.py script in the directory containing scrapers (hence, at the same path level), the following script will do the trick:

import os
import importlib

base_subdir = 'scrapers'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            submodule = importlib.import_module('.'.join((root, subdir, 'scraper')))
            submodule.main()

EDIT

If the script is inside the base_subdir path, the code can be adapted by changing a bit how the import_module() is called.

import os
import importlib

base_subdir = '.'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            script = importlib.import_module('.'.join((subdir, 'scraper')), root)
            script.main()

EDIT 2

Some explanations:

How `import_module()` is being used?

The import_module() line, is what is actually doing the job. Roughly speaking, when it is used with only one argument, i.e.

alias = importlib.import_module("my_module.my_submodule")

it is equivalent to:

import my_module.my_submodule as alias

Instead, when used with two argumens, i.e.

alias = importlib.import_module("my_submodule", "my_module")

it is equivalent to:

from my_module import my_submodule as alias

This second form is very convenient for relative imports (i.e. imports using . or .. special directories).

What is `if not subdir.startswith('__'):` doing?

When you import a module, Python will generate some bytecode to be interpreted and it will cache the result as .pyc files under the __cache__ directory. The aforementioned line will avoid that, when walking through the directories, __cache__ (actually, any directory starting with __) will be processed as if it would contain modules to import. Other kind of filtering may be equally valid.

so I have a file 'all.py' in the same directory as the scrapers I copied the code and got this error ```TypeError: the 'package' argument is required to perform a relative import for '..breckland_scraper.scraper'``` — Liban West, Oct 11 '18 at 11:35
it has to be on the parent directory, or you have to adapt the code accordingly. — norok2, Oct 11 '18 at 11:42
It works thank you very much, however could I get explanation for this line of code ```script = importlib.import_module('.'.join((subdir, 'scraper')), root)``` — Liban West, Oct 11 '18 at 14:25
also whats the purpose of ```if not subdir.startswith('__'):``` — Liban West, Oct 11 '18 at 14:31

score 0 · Answer 2 · answered Oct 11 '18 at 10:21

You may want to check os.walk function that traverses the directory tree and at each directory run the script (or the main function that you can wrap the contents of the script into).

An example code would be:

import os
for root, dirs, files in os.walk(".", topdown=False):
   scraper_main()

Run multiple python files in all subdirectories

2 Answers2

EDIT

EDIT 2

How import_module() is being used?

What is if not subdir.startswith('__'): doing?

How `import_module()` is being used?

What is `if not subdir.startswith('__'):` doing?