9

I am trying to find the most Pythonic way to take a string containing command line options:

"-t 500 -x -c 3 -d"

And turn it into a dictionary

{"-t":"500", "-x":True, "-c":"3", "-d": True}

UPDATE: The string should also be able to contain --long options, and words with dashes in the middle:

"-t 500 -x -c 3 -d --long-option 456 -testing weird-behaviour"

Before suggesting that I look into OptionParse module, keep in mind I don't know what the valid options are or anything like that, I am just trying to put the string into a dictionary to allow modifying it based on a different dictionary of options.

The approach I am considering is using split() to get the items into a list and then walking the list and looking for items that begin with a dash "-" and use them as the key, and then somehow getting to the next item on the list for the value. The problem I have is with options that don't have values. I thought of doing something like:

for i in range(0, len(opt_list)):
        if opt_list[i][0] == "-":
            if len(opt_list) > i+1 and not opt_list[i+1][0] == "-":
                opt_dict[opt_list[i]] = opt_list[i+1] 
            else:
                opt_dict[opt_list[i]] = True

But it seems like I am programming in C not Python when I do that...

Plazgoth
  • 1,242
  • 1
  • 12
  • 22
  • 1
    `list[i][0] == '-'` -> `lst.startswith('-')` (don't use `list` or `dict` as variable names -- that could lead to a bad day). You could also use `enumerate`, but that probably doesn't help too much... – mgilson Aug 17 '12 at 17:18
  • Thanks for the startswith() pointer. Yeah I am not using them as variable names, just in this example changed it to avoid confusion. – Plazgoth Aug 17 '12 at 17:25
  • Can the command line options be quoted? Is there any `--` flag that prevents subsequent arguments from being interpreted as flags? – Mike Samuel Aug 17 '12 at 17:46
  • Plazgoth: I added an edit to my answer to explain that what you want isn't actually possible to unambiguously parse with an arbitrary list of options (and allowing values to start with a '-') – Gerrat Aug 17 '12 at 20:52

7 Answers7

8

To handle spaces inside quotes correctly you could use shlex.split():

import shlex

cmdln_args = ('-t 500 -x -c 3 -d --long-option 456 '
              '-testing "weird -behaviour" -m "--inside"')

args = shlex.split(cmdln_args)
options = {k: True if v.startswith('-') else v
           for k,v in zip(args, args[1:]+["--"]) if k.startswith('-')}

from pprint import pprint
pprint(options)

Output

{'--inside': True,
 '--long-option': '456',
 '-c': '3',
 '-d': True,
 '-m': True,
 '-t': '500',
 '-testing': 'weird -behaviour',
 '-x': True}
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • +1 - thanks for the detailed comment on my answer, and this is a great solution. – Jay Aug 18 '12 at 00:46
  • +1 for the detailed solution being able to handle even more weirder behavior of quoted strings and spaces. – Plazgoth Aug 20 '12 at 16:21
  • From my understanding of the example it seems that the -m option should have a value of "--inside", instead the code seems to interpret them as separate options. – Plazgoth Aug 20 '12 at 16:24
  • @Plazgoth: no, it is the expected behavior. `shlex.split()` emulates how the shell works. try: `python -c "import sys; print(sys.argv)" -m "--inside"`. Notice: `-m` and `--inside` are treated the same way. – jfs Aug 20 '12 at 18:10
  • +1 for the dictionnary comprehension I had never seen !? what is the name of this form ? – Stephane Rolland Aug 23 '12 at 15:32
3

You could use regular expressions like so:

import re

args = "-t 500 -x -c 3 -d --long-option 456 -testing weird-behaviour"
matches = re.findall(r'(--?[\w-]+)(.*?)(?= -|$)', args)

result = {}
for match in matches:
    result[match[0]] = True if not match[1] else match[1].strip()

print result

and the result is equal to

{
'-d': True, 
'-c': '3', 
'-t': '500', 
'--long-option': '456', 
'-x': True, 
'-testing': 'weird-behaviour'
}

Regular Expression breakdown:

(--?[\w-]+)(.*?)(?= -|$)

  • (--?[\w-]+) matches any character or word (dashes allowed in the word) that starts with a "-" or a "--".
  • (.*?) matches any character 0 or more times in a non-greedy or minimal fashion by using the question mark.
  • (?= -|$) is a positive lookahead. It checks that what we are looking for is followed by a " -" or the end of the string but it does not include it in the match.

Note the use of parenthesis in this regular expression. These are used to create groups so when we call findall it will split them into tuples.

Jay
  • 18,959
  • 11
  • 53
  • 72
  • great, i think it's the most pythonic. – Stephane Rolland Aug 17 '12 at 19:00
  • i don't understand the (.*?)(?= -|$) in your regular expression. what does it mean ? – Stephane Rolland Aug 17 '12 at 19:01
  • @StephaneRolland I just got back from lunch. I've added in a breakdown of the regular expression - I hope it helps! – Jay Aug 17 '12 at 19:23
  • 1
    In general I stray away from regular expressions just because I think they are a pain to maintain across multiple owners. What makes sense to one person takes a while to grok by the next. Ignoring that, which one do you think is more efficient? Using 're' or just a loop like in my example? – Plazgoth Aug 17 '12 at 20:26
  • Also the regular expression solution as is does not handle '--' options correctly. I am sure it can be modified to do so, but that is one of the reasons I stay away from 're' it is not simple to be flexible. – Plazgoth Aug 17 '12 at 20:31
  • A simple `-?` does the trick for an optional double `-` at the beginning of an option & I've updated my answer. I'll work on the dash in the middle of an option as well, e.g. `--long-option`. I understand that regular expressions are difficult for some people to grasp - this one in particular is quite simple but it could grow into a monster as more requirements are thrown at it, see [this](http://stackoverflow.com/questions/10695143/#10695273) question for an example of that! As for efficiency I don't know which is better. – Jay Aug 17 '12 at 21:10
  • I've updated the answer to include options with dashes in them. If you needed any more symbols e.g. `--long-option-@2-#1` or something similar you would have to modify the regex to support it, which isn't desirable in your case. If I think of any other non-regex solutions that aren't already posted I'll update my answer. – Jay Aug 17 '12 at 21:24
  • 1
    +1 for the clean regex. But if the requirements are not fixed you would reimplement something like shlex.split() eventually, [example in my answer](http://stackoverflow.com/a/12013711/4279). also `result = {k: True if not v.strip() else v.strip() for k, v in matches}` note: v.strip() inside `if` i.e., you might need `\s*` in the regex – jfs Aug 17 '12 at 22:22
2

Argument Parsing for Humans - https://github.com/kennethreitz/args

fabiocerqueira
  • 762
  • 4
  • 12
1

I can't speak to the most Pythonic way, but here's a 1-liner:

opt_list = "-t 500 -x -c 3 -d"

dict((e if len(e) >1 else (e[0],True) for e in (elem.split() 
      for elem in ('-'+d for d in opt_list.split('-') if d))))

>>>{'-t': '500', '-x': True, '-c': '3', '-d': True}

[Edit: As Matthias pointed out, this won't work for values with a '-' in them]

...however, in general, I don't think the OP's answer can be solved unambiguously when you allow a '-' in option values.

consider these simple options:

"-a -b"

Is this:

  • {'-a': '-b'},
  • {'a-':True, '-b':True}

???

Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • 1
    Clever splitting on `"-"` instead of the (seemingly) more natural `" "`. I think if you split this up into a multi-liner which is easier to grok, you might have a decent solution here ... – mgilson Aug 17 '12 at 17:35
  • I like the 1-liner approach, but it will take me a bit to grok – Plazgoth Aug 17 '12 at 17:47
  • 3
    If we don't know anything about the options and values, what happens if the value is `weird-behaviour`. – Matthias Aug 17 '12 at 18:19
  • @Matthias: well, it won't work...but if a value is, let's say: `-10.0` most of the other solutions won't work either...not sure if there's really a fool proof way. – Gerrat Aug 17 '12 at 19:38
  • Agreed I don't want to prevent dashes in the middle of a word so I would avoid splitting on '-' – Plazgoth Aug 17 '12 at 20:23
  • You could always split on `" -"` (that is, "space, dash"). That would avoid breaking up options or values with dashes in them. You'd only need to avoid dashes being the first character of a value. – Blckknght Aug 17 '12 at 23:40
0
>>> results = "-t 500 -x -c 3 -d".split()
>>> rd = {}
>>> while i < len(results):
...    if results[i].startswith("-"):
...       rd[results[i]]=True
...       try:
...          if not results[i+1].startswith("-"):
...             rd[results[i]] = results[i+1]
...       except IndexError: pass
...    i+=1
...
>>> rd
{'-t': '500', '-x': True, '-c': '3', '-d': True}

but pretty simillar to what you have..

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
0

A harder problem that it initial appears, this is my first attempt. It simply loops over the arguments and checks if they start with a -. If so, and the next argument doesn't, then these two items are added to a dictioary, otherwise the current argument and True get added. The try is needed if the final item in the argument list starts with a -.

args = "-t 500 -x -c 3 -d".split()

d = {}

for i, item in enumerate(args):
    if item.startswith('-'):
        try:
            if args[i+1].startswith('-'):
                d[item] = True
            else:
                d[item] = args[i+1]
        except IndexError:
                d[item] = True

print d # prints {'-t': '500', '-x': True, '-c': '3', '-d': True}

Edit: An alternative solution, inspired by Gerrat's splitting on - is the following:

args = "-t 500 -x -c 3 -d".split('-')

d = {}

for arg in args:
    if arg:
        try:
            k, v = arg.split()
        except ValueError:
            k, v = arg.strip(), True

        d[k] = v

However, as Matthias points out, this may not work if the options and values have -s within them.

Community
  • 1
  • 1
Chris
  • 44,602
  • 16
  • 137
  • 156
0
import re

myDictionnary = {}

strPattern1 = "-[0-9a-z ]*"
strPattern2 = "-([0-9a-z]+) *(.*)"
strToParse = "-t 500 -x -c 3 -d"

listKeyValues = re.findall(strPattern1, strToParse)

for kv in listKeyValues:

    match = re.search(strPattern2, kv)

    key = match.group(1)
    value = match.group(2)

    if len(value) > 0:
        myDictionnary[key] = value
    else:
        myDictionnary[key] = True
Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169