8

When I'm writing shell scripts, I often find myself spending most of my time (especially when debugging) dealing with argument processing. Many scripts I write or maintain are easily more than 80% input parsing and sanitization. I compare that to my Python scripts, where argparse handles most of the grunt work for me, and lets me easily construct complex option structures and sanitization / string parsing behavior.

I'd love, therefore, to be able to have Python do this heavy lifting, and then get these simplified and sanitized values in my shell script, without needing to worry any further about the arguments the user specified.

To give a specific example, many of the shell scripts where I work have been defined to accept their arguments in a specific order. You can call start_server.sh --server myserver --port 80 but start_server.sh --port 80 --server myserver fails with You must specify a server to start. - it makes the parsing code a lot simpler, but it's hardly intuitive.

So a first pass solution could be something as simple as having Python take in the arguments, sort them (keeping their parameters next to them) and returning the sorted arguments. So the shell script still does some parsing and sanitization, but the user can input much more arbitrary content than the shell script natively accepts, something like:

# script.sh -o -aR --dir /tmp/test --verbose

#!/bin/bash

args=$(order.py "$@")
# args is set to "-a --dir /tmp/test -o -R --verbose"

# simpler processing now that we can guarantee the order of parameters

There's some obvious limitations here, notably that parse.py can't distinguish between a final option with an argument and the start of indexed arguments, but that doesn't seem that terrible.

So here's my question: 1) Is there any existing (Python preferably) utility to enable CLI parsing by something more powerful than bash, which can then be accessed by the rest of my bash script after sanitization, or 2) Has anyone done this before? Are there issues or pitfalls or better solutions I'm not aware of? Care to share your implementation?


One (very half-baked) idea:

#!/bin/bash

# Some sort of simple syntax to describe to Python what arguments to accept
opts='
"a", "append", boolean, help="Append to existing file"
"dir", str, help="Directory to run from"
"o", "overwrite", boolean, help="Overwrite duplicates"
"R", "recurse", boolean, help="Recurse into subdirectories"
"v", "verbose", boolean, help="Print additional information"
'

# Takes in CLI arguments and outputs a sanitized structure (JSON?) or fails
p=$(parse.py "Runs complex_function with nice argument parsing" "$opts" "$@")
if [ $? -ne 0 ]; exit 1; fi # while parse outputs usage to stderr

# Takes the sanitized structure and an argument to get
append=$(arg.py "$p" append)
overwrite=$(arg.py "$p" overwrite)
recurse=$(arg.py "$p" recurse)
verbose=$(arg.py "$p" verbose)

cd $(python arg.py "$p" dir)

complex_function $append $overwrite $recurse $verbose

Two lines of code, along with concise descriptions of the arguments to expect, and we're on to the actual script behavior. Maybe I'm crazy, but that seems way nicer than what I feel like I have to do now.


I've seen Parsing shell script arguments and things like this wiki page on easy CLI argument parsing, but many of these patterns feel clunky and error prone, and I dislike having to re-implement them every time I write a shell script, especially when Python, Java, etc. have such nice argument processing libraries.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • 2
    did you tried [getopt](http://linux.die.net/man/1/getopt)? – tuxuday Jul 27 '12 at 05:13
  • @tuxuday you beat me to it... getopt should help dimo414 – Michael Ballent Jul 27 '12 at 05:17
  • I've used getopt and getopts before (see link to wiki page at bottom of question) but they still have limitations - to quote the link: "getopt cannot handle empty arguments strings, or arguments with embedded whitespace." and "[getopts] can only handle short options (-h) without trickery." I realize there are solutions available in bash, but IMHO the options available to Python are superior, and easier to wrangle. I'm curious about the feasibility / existence of Python utilities to accomplish this. "You're dumb, use bash." may ultimately be an acceptable answer to this question. – dimo414 Jul 27 '12 at 05:24
  • @dimo414, we use `getopt` and happy with it. check whether `getopt` addresses your requirement, if it doesn't then drop it. from my exp `getopt` command is good enough for most of the needs. – tuxuday Jul 27 '12 at 05:35
  • Thank you yes, I have used getopt as well. Like I said, *I* am not happy with it. It does work, but I think a better tool exists, or could be made. I'm trying to explore the feasibility of this idea, not rehash that getopt is one option for parsing command line arguments. – dimo414 Jul 27 '12 at 05:39
  • what about treating arguments as strings between "-", so split("-") and sort then – Luka Rahne Jul 27 '12 at 06:10
  • Why not write the whole script in Python? There exists a lot of modules (e.g. [`shutil`](http://docs.python.org/library/shutil.html)) that can do most if not all that can be done in a Bash script. – Some programmer dude Jul 27 '12 at 07:33
  • @JoachimPileborg - absolutely, and like I said, I often do write Python scripts. But there are still advantages of shell scripting, and when there's something complex I want to do in bash, wrapping it all up in Python is sometimes even more cludgy than having bash parse arguments. That's why I'd love to split the parsing off into it's own script - it seems completely possible to get the best of both worlds. – dimo414 Jul 27 '12 at 12:55
  • @ralu, good start, but that doesn't work for long (`--dir`) arguments, and there are edge cases that fail. For instance `script.sh -dir /tmp/my-dashed-file -a -b` would come back as `-a -b -dashed -dir /tmp/my -file`. Splitting on `' -'` might be what you meant, which would be slightly better, but would still fail on `script.sh -t "This string -10+4/3 shouldn't be parsed"`. In general, best to let the shell split the input string up, and have your script only do the argument parsing. – dimo414 Jul 27 '12 at 13:11

4 Answers4

2

You could potentially take advantage of associative arrays in bash to help obtain your goal.

declare -A opts=($(getopts.py $@))
cd ${opts[dir]}
complex_function ${opts[append]}  ${opts[overwrite]} ${opts[recurse]} \
                 ${opts[verbose]} ${opts[args]}

To make this work, getopts.py should be a python script that parses and sanitizes your arguments. It should print a string like the following:

[dir]=/tmp
[append]=foo
[overwrite]=bar
[recurse]=baz
[verbose]=fizzbuzz
[args]="a b c d"

You could set aside values for checking that the options were able to be properly parsed and sanitized as well.

Returned from getopts.py:

[__error__]=true

Added to bash script:

if ${opts[__error__]}; then
    exit 1
fi

If you would rather work with the exit code from getopts.py, you could play with eval:

getopts=$(getopts.py $@) || exit 1
eval declare -A opts=($getopts)

Alternatively:

getopts=$(getopts.py $@)
if [[ $? -ne 0 ]]; then
    exit 1;
fi
eval declare -A opts=($getopts)
Swiss
  • 5,556
  • 1
  • 28
  • 42
  • Clever idea with the associative arrays! That would be quite nice - though unfortunately in my work environment I can't guarantee that all the machines have that. That said, it wouldn't be too hard to roll-my-own and have the python script return a bunch of variable assignments, like `$in_dir=/tmp; $in_append=0;` ect. – dimo414 Jul 27 '12 at 13:03
2

Edit: I haven't used it (yet), but if I were posting this answer today I would probably recommend https://github.com/docopt/docopts instead of a custom approach like the one described below.


I've put together a short Python script that does most of what I want. I'm not convinced it's production quality yet (notably error handling is lacking), but it's better than nothing. I'd welcome any feedback.

It takes advantage of the set builtin to re-assign the positional arguments, allowing the remainder of the script to still handle them as desired.

bashparse.py

#!/usr/bin/env python

import optparse, sys
from pipes import quote

'''
Uses Python's optparse library to simplify command argument parsing.

Takes in a set of optparse arguments, separated by newlines, followed by command line arguments, as argv[2] and argv[3:]
and outputs a series of bash commands to populate associated variables.
'''

class _ThrowParser(optparse.OptionParser):
    def error(self, msg):
        """Overrides optparse's default error handling
        and instead raises an exception which will be caught upstream
        """
        raise optparse.OptParseError(msg)

def gen_parser(usage, opts_ls):
    '''Takes a list of strings which can be used as the parameters to optparse's add_option function.
    Returns a parser object able to parse those options
    '''
    parser = _ThrowParser(usage=usage)
    for opts in opts_ls:
        if opts:
            # yes, I know it's evil, but it's easy
            eval('parser.add_option(%s)' % opts)
    return parser

def print_bash(opts, args):
    '''Takes the result of optparse and outputs commands to update a shell'''
    for opt, val in opts.items():
        if val:
            print('%s=%s' % (opt, quote(val)))
    print("set -- %s" % " ".join(quote(a) for a in args))

if __name__ == "__main__":
    if len(sys.argv) < 2:
        sys.stderr.write("Needs at least a usage string and a set of options to parse")
        sys.exit(2)
    parser = gen_parser(sys.argv[1], sys.argv[2].split('\n'))

    (opts, args) = parser.parse_args(sys.argv[3:])
    print_bash(opts.__dict__, args)

Example usage:

#!/bin/bash

usage="[-f FILENAME] [-t|--truncate] [ARGS...]"
opts='
"-f"
"-t", "--truncate",action="store_true"
'

echo "$(./bashparse.py "$usage" "$opts" "$@")"
eval "$(./bashparse.py "$usage" "$opts" "$@")"

echo
echo OUTPUT

echo $f
echo $@
echo $0 $2

Which, if run as: ./run.sh one -f 'a_filename.txt' "two' still two" three outputs the following (notice that the internal positional variables are still correct):

f=a_filename.txt
set -- one 'two'"'"' still two' three

OUTPUT
a_filename.txt
one two' still two three
./run.sh two' still two

Disregarding the debugging output, you're looking at approximately four lines to construct a powerful argument parser. Thoughts?

dimo414
  • 47,227
  • 18
  • 148
  • 244
2

Having the very same needs, I ended up writing an optparse-inspired parser for bash (which actually uses python internally); you can find it here:

https://github.com/carlobaldassi/bash_optparse

See the README at the bottom for a quick explanation. You may want to check out a simple example at:

https://github.com/carlobaldassi/bash_optparse/blob/master/doc/example_script_simple

From my experience, it's quite robust (I'm super-paranoid), feature-rich, etc., and I'm using it heavily in my scripts. I hope it may be useful to others. Feedback/contributions welcome.

  • You created a sophisticated tool and it has beautiful declarative syntax. Why not allow people to use with NORMAL installation instructions? Really, I just don't know how to install it. I tried ./configure; make; make install and it fails. – snowindy Oct 30 '15 at 08:25
0

The original premise of my question assumes that delegating to Python is the right approach to simplify argument parsing. If we drop the language requirement we can actually do a decent job* in Bash, using getopts and a little eval magic:

main() {
  local _usage='foo [-a] [-b] [-f val] [-v val] [args ...]'
  eval "$(parse_opts 'f:v:ab')"
  echo "f=$f v=$v a=$a b=$b -- $#: $*"
}

main "$@"

The implementation of parse_opts is in this gist, but the basic approach is to convert options into local variables which can then be handled like normal. All the standard getopts boilerplate is hidden away, and error handling works as expected.

Because it uses local variables within a function, parse_opts is not just useful for command line arguments, it can be used with any function in your script.


* I say "decent job" because Bash's getopts is a fairly limited parser and only supports single-letter options. Elegant, expressive CLIs are still better implemented in other languages like Python. But for reasonably small functions or scripts this provides a nice middle ground without adding too much complexity or bloat.

dimo414
  • 47,227
  • 18
  • 148
  • 244