How can I build sub-commands to bundle scripts that should also work independently?

Question

I am currently working on a research project (my bachelors thesis) for handwriting recognition. I wrote a lot of Python scripts so far and I would like to make them useful for other people. So I created a project on PyPI: https://pypi.python.org/pypi/hwrt/

Currently, only 2 executable scripts are there: backup.py and view.py. When it is installed via pip I can call them, so that works:

$ backup.py --help
usage: backup.py [-h] [-d FOLDER] [-s] [-o]

Download raw data from online server and back it up (e.g. on dropbox)
handwriting_datasets.pickle.

optional arguments:
  -h, --help            show this help message and exit
  -d FOLDER, --destination FOLDER
                        where do write the handwriting_dataset.pickle
                        (default: /home/moose/Downloads/write-math/archive
                        /raw-datasets)
  -s, --small           should only a small dataset (with all capital letters)
                        be created? (default: False)
  -o, --onlydropbox     don't download new files; only upload to dropbox
                        (default: False)

$ view.py --help
usage: view.py [-h] [-i ID] [--mysql MYSQL] [-m FOLDER]

Display a raw_data_id.

optional arguments:
  -h, --help            show this help message and exit
  -i ID, --id ID        which RAW_DATA_ID do you want?
  --mysql MYSQL         which mysql configuration should be used?
  -m FOLDER, --model FOLDER
                        where is the model folder (with a info.yml)?

I got this via scripts in setup.py:

try:
    from setuptools import setup
except ImportError:
    from distutils.core import setup

config = {
    'name': 'hwrt',
    'version': '0.1.19',
    'author': 'Martin Thoma',
    'author_email': 'info@martin-thoma.de',
    'packages': ['hwrt'],
    'scripts': ['bin/backup.py', 'bin/view.py'],
    'url': 'https://github.com/MartinThoma/hwrt',
    'license': 'MIT',
    'description': 'Handwriting Recognition Tools',
    'long_description': """A tookit for handwriting recognition. It was
    developed as part of the bachelors thesis of Martin Thoma.""",
    'install_requires': [
        "argparse",
        "theano",
        "nose",
    ],
    'keywords': ['HWRT', 'recognition', 'handwriting', 'on-line'],
    'download_url': 'https://github.com/MartinThoma/hwrt',
}

setup(**config)

However, I would rather want them to be called like this:

$ hwrt backup --help
(just what came before for 'backup.py --help')
$ hwrt view --help
(just what came before for 'view.py --help')
$ hwrt --help
(a list of all sub-commands)

I know that this can be done with sub-commands and argparse. However, this would mean I had to create a new script where I bundle all commands for argparse. But I also would like the scripts to work independently. It just feels more logically for me to adjust command line parameters that are only important for backup.py only in backup.py and not in another file.

Is there a way to adjust my scripts so that they "discover" the scripts in the bin folder and add all of them as sub-commands?

score 0 · Answer 1 · edited May 23 '17 at 12:11

This might be a case for using using parents.

For example, lets assume both of your scripts create a parser object when loaded (or have a function that creates a parser):

import argparse
from backup import parser as backup_parser
from view import parser as view_parser

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    subparsers = parser.add_subparsers(dest='cmd')
    # or use the parser.setdefault() as described in the documentation
    backup = subparsers.add_parser('backup', add_help=False, parents=[backup_parser])
    view = subparsers.add_parser('view', add_help=False, parents=[view_parser])
    args = parser.parse_args()

This should print the appropriate helps. args.cmd will identify the subcommand, and the other attributes will be the respective arguments. The backup subparser will be a clone of the parser imported from backup.py. (I haven't tested this script so there may be some typos or bugs, but it gives the general idea.)

How to handle CLI subcommands with argparse discusses a couple of ways of handling subcommands.

Ipython uses argparse to handle both the main interface, and many of the magic commands. It populates its parsers with arguments and values from the config files. That way a large number of parameters can be set either with default configs, customized configs, or on the command line.

I've just tried it and at a first glance it seemed to work. I had to move the `parser = ArgumentParser(description=__doc__)` and related stuff from backup.py and view.py outside of the `if __name__=='__main__':` block, but that would be ok I guess. However, when I wanted to execute `hwrt backup -whatever parameter` it did not execute. I guess the problem is that the "executing" part was still under `if __name__=='__main__'`. So I completely removed `if __name__=='__main__'` from view.py and backup.py. But then `./hwrt --help` did only show `backup.py`. — Martin Thoma, Oct 09 '14 at 20:49
Yes, `backup.py` (and `view.py`) would have to be organized so that the parser creator code and the main execution code is importable. Only the actual `parse_args` and calling of the main main code would be guarded by the 'if main'. I suppose my code should have been organized in the same way - so it too could be imported! — hpaulj, Oct 09 '14 at 22:04
An alternative is to use `parse_know_args` to find out which script you want to run, and then invoke `backup.py` or `view.py` as separate processes (with `subprocess` or `multiprocessing`), giving them the unknown argument strings. — hpaulj, Oct 09 '14 at 22:12

How can I build sub-commands to bundle scripts that should also work independently?

1 Answers1