does a publicly available partial solution exist to parse *nix-style command line options without pre-knowledge of the keys?

Question

does a publicly available partial solution exist (in any language) to parse *nix command line options into a data structure in the case where the option keys are not known in advance.

basically, parse something like

my-script -x ex --y=why zebra

and get

{'x': 'ex', 'y': 'why'}

without knowing that the option keys will be x and y before parsing.

there have been similar questions asked regarding perl and java, but no positive responses.

i understand that "command line options" are not a well-defined syntax and that any such solution will not produce the desired output for all inputs, but am asking if any such partial solution is known.

https://github.com/search?l=Shell&q=bash+framework&type=Repositories&utf8=%E2%9C%93 — kojiro, Jun 10 '15 at 19:43
I've had very positive experiences with Python's argparse, to the point that I often develop my scripts in Python rather than bash. — rr-, Jun 10 '15 at 19:59
hm.. and what should be the result for `-x -100 -y -200`? And for `-x -b -y -c`? And `-1 -2`? — clt60, Jun 10 '15 at 19:59
[`argparse`](https://docs.python.org/3/library/argparse.html), way to go. — 4ae1e1, Jun 10 '15 at 20:06
Other than that: (roughly) POSIX compatible shells have `getopts`, which supports short options only; there's also an external `getopt`, but only Linux `getopt(1)` can parse GNU style long options. Perl has [`Getopt:Std`](http://perldoc.perl.org/Getopt/Std.html), but still, short options only. — 4ae1e1, Jun 10 '15 at 20:08
for perl exists tens of different arg parsers: https://metacpan.org/search?q=Getopt :) longs, shorts mixed, trees etc..etc.. But you still need to know what you want parse (see the above examples - the -100 is an parameter for the `-x` or it an argument alone? e.g `{"x": -100}` or `["x", "-100"]` ) — clt60, Jun 10 '15 at 20:13
thank you @jm666 for reading and understanding the question instead of just name dropping your fav well know libraries :) there is no syntax for command line options, so you can literally write anything with some dashes and spaces and characters and say "what about that"? i know. i'm interested in looking at how people have actually approached this problem. — nrser, Jun 10 '15 at 21:40
@jm666 in response to your first comment - my guess is that if you're going to say that `-` starts an option key and that `-x` flag style and `-x 1` key/value style are parsed, then i think you would have to say that tokens that start with a `-` are keys, not values. that would mean no naturally-written negative values (`"-x -100 -y -200" => {x: true, 100: true, y: true, 200: true}`). but this is why i'm interested in seeing what people have done and if anyone is using it. i know there are **tons** of options out there for parsing when you know the option key names. — nrser, Jun 10 '15 at 21:47
If you want parse something, you need define some: **rules**. One version of the rules: knowing the possible arguments beforehand. Another possible is: define than the parameters must be assigned with `=` such `--x=-100`. Here are many other variants possible. For example, just use JSON directly as an argument. ;) But parse arbitrary arguments to something meaningful without strict rules - is impossible. Also ask yourself: want invent the wheel again, or rather want use something what is used "usually" and it is "well known"? — clt60, Jun 10 '15 at 22:22
Also, be prepared - this question probably get closed in this form, because it is too broad and opinion based. — clt60, Jun 10 '15 at 22:26
@jm666 yeah, i know. what i'm trying to do here is find some rules (*besides* know the opt keys beforehand) that people have used so i can take a look at them. — nrser, Jun 10 '15 at 22:29
@nrser, StackOverflow is aimed at questions that have canonical, correct answers. Polling folks for approaches is out-of-scope, and questions that ask for library and tool suggestions are **explicitly** against the rules. — Charles Duffy, Jun 10 '15 at 22:29
@nrser, the other thing is that it's impossible to parse `-x ex --y=why zebra` and get the result you want without knowing that `-x` takes an argument. Otherwise, if you support GNU conventions (allowing mixing of positional arguments and options), the desired effect could be to set the _flag_ `x` to true and treat `ex` like a positional argument, just as `zebra` is also a positional argument. — Charles Duffy, Jun 10 '15 at 22:32
@nrser, ...moreover, being schemaless, you would have to disallow values starting with dashes from being passed, since you have no way to distinguish between a value (from a key/value pair) of starting with a dash and a key from a future pair. Which is to say: There's a reason you aren't finding schemaless command-line parsers in any language. — Charles Duffy, Jun 10 '15 at 22:35
@CharlesDuffy sorry i thought does anything that does this exist was pretty specific - it's yes or no answer in fact, and the yes part is easy to canonically verify. — nrser, Jun 10 '15 at 22:35
@CharlesDuffy yup, all things stuff you're bringing up are issues. you could not start values with dashes and you could not mix positional args in with options (without involving some broader probabilistic analysis). — nrser, Jun 10 '15 at 22:46

Charles Duffy · Accepted Answer · 2015-06-10T23:00:37.387

3

Parsing UNIX command-line options in general is not possible in a schemaless manner, especially when supporting GNU conventions (ie. allowing intermixing between options and arguments).

Consider the usage you gave here:

my-script -x ex --y=why zebra

Now, should this be:

{options: {x: "ex", y: "why"}; arguments=["zebra"]}

...or should it be...

{options: {x: True, y: "why"}; arguments=["ex", "zebra"]}

The answer is that you don't know without knowing whether x accepts an argument -- meaning that you need a schema.

Consider also:

nice -n -1

Is -1 an argument to -n, or is 1 a key value? Again, you can't tell.

Thus: Schemaless command-line parsers exist, but do not cover enough cases to be widely useful -- and thus are typically isolated within the programs that use them, rather than being made into a library.

A typical schemaless command-line parser in bash (4.0 or newer), by the way, might look like the following:

# put key/value pairs into kwargs, other arguments into args
declare -A kwargs=( )
args=( )
args_only=0
for arg; do
  if (( args_only )); then
    args+=( "$arg" )
    continue
  fi
  case $arg in
    --) args_only=1 ;;
    --*=*)
      arg=${arg#--}
      kwargs[${arg%%=*}]=${arg#*=}
      ;;
    --*) kwargs[${arg#--}]=1 ;;
    -*) kwargs[${arg#-}]=1 ;;
    *) args+=( "$arg" ) ;;
  esac
done

This would work with...

my-script --x=ex --y=why zebra

...resulting in the values:

args=( zebra )
kwargs=( [x]=ex [y]=why )

...and also with some more interesting useful cases as well, but still would be a long distance from handling the general case.

edited Jun 10 '15 at 23:00

answered Jun 10 '15 at 22:42

Charles Duffy

280,126
43
390
441

> Schemaless command-line parsers exist this is what i'm asking about... can you point to one? – nrser Jun 10 '15 at 22:47
@nrser, sure. I've written a few, embedded deep inside other software. That doesn't mean any of them are reusable. Also, they don't parse command-line options "in general"; they parse the restricted set of command-line options that the programs in Suite X allowed. (One of these required all key/value pairs to use full `--key=value` form, for instance). – Charles Duffy Jun 10 '15 at 22:48
so, yes, they exist, but no, they aren't responsive to your question, because they don't handle options "in general". – Charles Duffy Jun 10 '15 at 22:50
ok, cool. i'm basically gathering from this that there isn't really anything out there publicly. – nrser Jun 10 '15 at 22:50
you're right, "in general" is contradictory... i'll remove it. – nrser Jun 10 '15 at 22:50
@nrser, ...I've added a simple one to my answer. – Charles Duffy Jun 10 '15 at 22:52
@nrser also is allowed combine the arguments, e.g. `-a -b` could be written as `-ab`. So, when the program is called as: `cmd -23` is one argument `-23` or two e.g. `-2` and `-3`? (just check the `comm` command) :) – clt60 Jun 10 '15 at 23:04
...or is it `-2` with the argument `3`? :) – Charles Duffy Jun 10 '15 at 23:09
i think this does a good job of outlining many of the issues with this sort of parsing, and gives decent idea of why prominent partial solutions are not readily available, thanks. – nrser Jun 11 '15 at 05:28

does a publicly available partial solution exist to parse *nix-style command line options without pre-knowledge of the keys?

1 Answers1