6

What I need

I need an ArgumentParser, with a conflict handling scheme, that resolves some registered set of duplicate arguments, but raises on all other arguments.

What I tried

My initial approach (see also the code example at the bottom) was to subclass ArgumentParser, add a _handle_conflict_custom method, and then instantiate the subclass with ArgumentParser(conflict_handler='custom'), thinking that the _get_handler method would pick it up.

The Problem

This raises an error, because the ArgumentParser inherits from _ActionsContainer, which provides the _get_handler and the _handle_conflict_{strategy} methods, and then internally instantiates an _ArgumentGroup (that also inherits from _ActionsContainer), which in turn doesn't know about the newly defined method on ArgumentParser and thus fails to get the custom handler.

Overriding the _get_handler method is not feasible for the same reasons.

I have created a (rudimentary) class diagram illustrating the relationships, and therefore hopefully the problem in subclassing ArgumentParser to achieve what I want.

class_diagram.png

Motivation

I (think, that I) need this, because I have two scripts, that handle distinct parts of a workflow, and I would like to be able to use those separately as scripts, but also have one script, that imports the methods of both of these scripts, and does everything in one go.

This script should support all the options of the two individual scripts, but I don't want to duplicate the (extensive) argument definitions, so that I would have to make changes in multiple places.
This is easily solved by importing the ArgumentParsers of the (part) scripts and using them as parents, like so combined_parser = ArgumentParser(parents=[arg_parser1, arg_parser2]).

In the scripts I have duplicate options, e.g. for the work directory, so I need to resolve those conflicts.
This could also be done, with conflict_handler='resolve'.

But because there are a lot of possible arguments (which is not up to our team, because we have to maintain compatibility), I also want the script to raise an error if something gets defined that causes a conflict, but hasn't been explicitly allowed to do so, instead of quietly overriding the other flag, potentially causing unwanted behavior.

Other suggestions to achieve these goals (keeping both scripts separate, enabling use of one script that wraps both, avoiding code duplication and raising on unexpected duplicates) are welcome.

Example Code

from argparse import ArgumentParser


class CustomParser(ArgumentParser):
    def _handle_conflict_custom(self, action, conflicting_actions):
        registered = ['-h', '--help', '-f']
        conflicts = conflicting_actions[:]

        use_error = False
        while conflicts:
            option_string, action = conflicts.pop()
            if option_string in registered:
                continue
            else:
                use_error = True
                break

        if use_error:
            self._handle_conflict_error(action, conflicting_actions)
        else:
            self._handle_conflict_resolve(action, conflicting_actions)


if __name__ == '__main__':
    ap1 = ArgumentParser()
    ap2 = ArgumentParser()

    ap1.add_argument('-f')  # registered, so should be resolved
    ap2.add_argument('-f')

    ap1.add_argument('-g')  # not registered, so should raise
    ap2.add_argument('-g')

    # this raises before ever resolving anything, for the stated reasons
    ap3 = CustomParser(parents=[ap1, ap2], conflict_handler='custom')


Other questions

I am aware of these similar questions:

But even though some of them provide interesting insights into argparse usage and conflicts, they seem to address issues that are not related to mine.

Fynn
  • 303
  • 1
  • 11
  • While I probably know `argparse` as well as anyone (I've followed the bug/issues since 2013), I haven't done much with the conflict handler. All the relevant methods are in the `_Actions_Container` class, not `ArgumentParser`. 'error' and 'resolve' are the two provided methods, but I'm sure a custom one could be added using the same pattern. `_get_handler` translates the user provided string into a method. To my knowledge few people (users or developers) have tried to expand on this, so you are, for the most part, on your own. – hpaulj Dec 22 '21 at 00:01
  • A couple things could be confusing you. `add_argument` is inherited from container. And the `add_action` is done by a group. Even when adding to a parser, the `add_action` is delegated to one of default groups. When using `parents`, groups and actions are copied via the Container `_add_container_actions` method. It's here where conflicts are most likely. Actions are copied by reference. – hpaulj Dec 22 '21 at 00:48
  • https://stackoverflow.com/questions/25818651/argparse-conflict-resolver-for-options-in-subcommands-turns-keyword-argument-int is a SO that deals with parents and conflict handling. It may not help but it does illustrate the complications. – hpaulj Dec 22 '21 at 01:48

4 Answers4

3

While I agree that FMc's approach is probably the better one in terms of long term viability, I have found a way to override a custom handler into the ArgumentParser.

The key is to override the _ActionsContainer class which actually defines the handler functions. Then to override the base classes that the ArgumentParser and _ArgumentGroup inherit from.

In the case below, I've simply added a handler that ignores any conflicts, but you could add any custom logic you want.

import argparse

class IgnorantActionsContainer(argparse._ActionsContainer):
    def _handle_conflict_ignore(self, action, conflicting_actions):
        pass

argparse.ArgumentParser.__bases__ = (argparse._AttributeHolder, IgnorantActionsContainer)
argparse._ArgumentGroup.__bases__ = (IgnorantActionsContainer,)

parser = argparse.ArgumentParser(conflict_handler="ignore")
parser.add_argument("-a", type=int, default=1)
parser.add_argument("-a", type=int, default=2)
parser.add_argument("-a", type=int, default=3)
parser.add_argument("-a", type=int, default=4)
print(parser.parse_args())

Running python custom_conflict_handler.py -h prints:

usage: custom_conflict_handler.py [-h] [-a A] [-a A] [-a A] [-a A]

optional arguments:
  -h, --help  show this help message and exit
  -a A
  -a A
  -a A
  -a A

Running python custom_conflict_handler.py prints:

Namespace(a=1)

Running python custom_conflict_handler.py -a 5 prints:

Namespace(a=5)
Hans
  • 31
  • 1
  • That's a neat way to do it, thanks for sharing. I like, that it's a lot more concise! – Fynn May 05 '22 at 15:35
2

For a various reasons -- notably the needs of testing -- I have adopted the habit of always defining argparse configuration in the form of a data structure, typically a sequence of dicts. The actual creation of the ArgumentParser is done in a reusable function that simply builds the parser from the dicts. This approach has many benefits, especially for more complex projects.

If each of your scripts were to shift to that model, I would think that you might be able to detect any configuration conflicts in that function and raise accordingly, thus avoiding the need to inherit from ArgumentParser and mess around with understanding its internals.

I'm not certain I understand your conflict-handling needs very well, so the demo below simply hunts for duplicate options and raises if it sees one, but I think you should be able to understand the approach and assess whether it might work for your case. The basic idea is to solve your problem in the realm of ordinary data structures rather than in the byzantine world of argparse.

import sys
import argparse
from collections import Counter

OPTS_CONFIG1 = (
    {
        'names': 'path',
        'metavar': 'PATH',
    },
    {
        'names': '--nums',
        'nargs': '+',
        'type': int,
    },
    {
        'names': '--dryrun',
        'action': 'store_true',
    },
)

OPTS_CONFIG2 = (
    {
        'names': '--foo',
        'metavar': 'FOO',
    },
    {
        'names': '--bar',
        'metavar': 'BAR',
    },
    {
        'names': '--dryrun',
        'action': 'store_true',
    },
)

def main(args):
    ap = define_parser(OPTS_CONFIG1, OPTS_CONFIG2)
    opts = ap.parse_args(args)
    print(opts)

def define_parser(*configs):
    # Validation: adjust as needed.
    tally = Counter(
        nm
        for config in configs
        for d in config
        for nm in d['names'].split()
    )
    for k, n in tally.items():
        if n > 1:
            raise Exception(f'Duplicate argument configurations: {k}')

    # Define and return parser.
    ap = argparse.ArgumentParser()
    for config in configs:
        for d in config:
            kws = dict(d)
            xs = kws.pop('names').split()
            ap.add_argument(*xs, **kws)
    return ap

if __name__ == '__main__':
    main(sys.argv[1:])
FMc
  • 41,963
  • 13
  • 79
  • 132
  • 1
    Definite +1 This is a *very* good suggestion. Trying to work around the implementation details of argparse was a maddening experience, that I don't wish on anyone. – Fynn Dec 22 '21 at 08:53
  • I will play around with this approach a bit before accepting (maybe some other cool suggestions pop up in the meantime), but this is probably the way, because it neatly circumvents the need for subclassing, and still gives me the behaviour I'm looking for – Fynn Dec 22 '21 at 08:56
1

There is an answer that's marginally less hacky than Hans's approach. You can simply subclass argparse.ActionsContainer and argparse.ArgumentGroup and make sure you inherit from ActionsContainer after argparse.ArgumentParser, this way it'll be later in the MRO and will take precedence. Here's an example:

import argparse
from typing import Iterable, Any

class ArgumentGroup(argparse._ArgumentGroup):
    def _handle_conflict_custom(
        self,
        action: argparse.Action,
        conflicting_actions: Iterable[tuple[str, argparse.Action]],
    ) -> None:
        ...


class ActionsContainer(argparse._ActionsContainer):
    def _handle_conflict_custom(
        self,
        action: argparse.Action,
        conflicting_actions: Iterable[tuple[str, argparse.Action]],
    ) -> None:
        ...

    def add_argument_group(self, *args: Any, **kwargs: Any) -> ArgumentGroup:
        group = ArgumentGroup(self, *args, **kwargs)
        self._action_groups.append(group)
        return group

class ArgumentParser(argparse.ArgumentParser, ActionsContainer):
    ...

Brosa
  • 1,169
  • 9
  • 19
  • This might be the cleanest way yet, but it is impractical for our project, because at this point we have tacked on a lot of stuff, where having our custom data structure is pretty useful. But I might revisit this idea in a future refactor! Thanks for sharing =) – Fynn Jan 28 '23 at 17:29
0

Based on FMcs approach I have created something a little more elaborate, I know this isn't code review, but feedback is still welcome. Also, maybe it helps someone to see this fleshed out a bit more.

import argparse

from collections import Counter, OrderedDict
from typing import List, Dict, Any
from copy import deepcopy


class OptionConf:
    def __init__(self):
        self._conf = OrderedDict()  # type: Dict[str, List[Dict[str, Any]]]
        self._allowed_dupes = list()  # type: List[str]

    def add_conf(self, command, *conf_args, **conf_kwargs):
        if command not in self._conf:
            self._conf[command] = []

        conf_kwargs['*'] = conf_args
        self._conf[command].append(conf_kwargs)

    def add_argument(self, *conf_args, **conf_kwargs):
        self.add_conf('add_argument', *conf_args, **conf_kwargs)

    def register_allowed_duplicate(self, flag):
        self._allowed_dupes.append(flag)

    def generate_parser(self, **kwargs):
        argument_parser = argparse.ArgumentParser(**kwargs)
        for command, conf_kwargs_list in self._conf.items():
            command_func = getattr(argument_parser, command)

            for conf_kwargs in conf_kwargs_list:
                list_args = conf_kwargs.pop('*', [])
                command_func(*list_args, **conf_kwargs)
                conf_kwargs['*'] = list_args

        return argument_parser

    def _get_add_argument_conf_args(self):
        for command, kwargs_list in self._conf.items():
            if command != 'add_argument':
                continue
            return kwargs_list
        return []

    def resolve_registered(self, other):
        if self.__class__ == other.__class__:
            conf_args_list = self._get_add_argument_conf_args()  # type: List[Dict[str, Any]]
            other_conf_args_list = other._get_add_argument_conf_args()  # type: List[Dict[str, Any]]

            # find all argument names of both parsers
            all_names = []
            for conf_args in conf_args_list:
                all_names += conf_args.get('*', [])

            all_other_names = []
            for other_conf_args in other_conf_args_list:
                all_other_names += other_conf_args.get('*', [])

            # check for dupes and throw if appropriate
            found_allowed_dupes = []
            tally = Counter(all_names + all_other_names)
            for name, count in tally.items():
                if count > 1 and name not in self._allowed_dupes:
                    raise Exception(f'Duplicate argument configurations: {name}')
                elif count > 1:
                    found_allowed_dupes.append(name)

            # merge them in a new OptionConf, preferring the args of self (AS OPPOSED TO ORIGINAL RESOLVE)
            new_opt_conf = OptionConf()
            for command, kwargs_list in self._conf.items():
                for kwargs in kwargs_list:
                    list_args = kwargs.get('*', [])
                    new_opt_conf.add_conf(command, *list_args, **kwargs)

            for command, kwargs_list in other._conf.items():
                for kwargs in kwargs_list:
                    # if it's another argument, we remove dupe names
                    if command == 'add_argument':
                        all_names = kwargs.pop('*', [])
                        names = [name for name in all_names if name not in found_allowed_dupes]
                        # and only add if there are names left
                        if names:
                            new_opt_conf.add_argument(*deepcopy(names), **deepcopy(kwargs))
                        # put names back
                        kwargs['*'] = all_names
                    else:
                        # if not, we just add it
                        list_args = kwargs.pop('*', [])
                        new_opt_conf.add_conf(command, *deepcopy(list_args), **deepcopy(kwargs))
                        # put list args back
                        kwargs['*'] = list_args

            return new_opt_conf

        raise NotImplementedError()


if __name__ == '__main__':
    opts_conf = OptionConf()

    opts_conf.add_argument('pos_arg')
    opts_conf.add_argument('-n', '--number', metavar='N', type=int)
    opts_conf.add_argument('-i', '--index')
    opts_conf.add_argument('-v', '--verbose', action='store_true')

    opts_conf2 = OptionConf()

    opts_conf2.add_argument('-n', '--number', metavar='N', type=int)
    opts_conf2.add_argument('-v', action='store_true')

    opts_conf.register_allowed_duplicate('-n')
    opts_conf.register_allowed_duplicate('--number')

    try:
        resolved_opts = opts_conf.resolve_registered(opts_conf2)
    except Exception as e:
        print(e)  # raises on -v

    opts_conf.register_allowed_duplicate('-v')
    resolved_opts = opts_conf.resolve_registered(opts_conf2)

    ap = resolved_opts.generate_parser(description='does it work?')

    ap.parse_args(['-h'])
Fynn
  • 303
  • 1
  • 11