-2

Some program command lines have state that applies to the arguments following the state-setting option (the '--' argument to rm and touch is an example; ffmpeg is infamous for stateful arg parsing). For example:

readmulticsv --cols 1,2,4 file1.csv --date_format "%Y-%m-%d" file2.csv --cols 4,3,9 file3.csv file4.csv

Here, it will pull columns 1, 2, and 4 from file1.csv and file2.csv and then pull columns 4, 3, and 9, in that order, from file3.csv and file4.csv. Further, it will start interpreting dates (in the first column of the --cols argument) in file1.csv with a default "%m/%d/%Y" format, but switch to "%Y-%m-%d" for the remaining files. What I want is a list of lists, where each element list has the file name and the values of the relevant state variables:

[["file1.csv", "1,2,4", "%m/%d/%Y"],
 ["file2.csv", "1,2,4", "%Y-%m-%d"],
...
]

Implementing this is straightforward if you walk sys.argv manually.

Is there a way to do this with argparse? My program uses argparse for many other options and its nice help feature, and the whole code is written around its Namespace object. I could use parse_known_args() and leave the rest for a "walk" approach, but that excludes --cols, --date_format, and the files from the help and Namespace. I've tried figuring out an Action(), but I'm not sure how to proceed there. The docs for setting that up aren't super clear to me and I don't see how to access the existing state.

Is there an alternative arg parser that can do it all (help, defaults, a Namespace)?

My application is a program to calculate stock basis, gain, and growth by reading CSV transaction files, where the investments have transferred between brokers with different file formats and format changes over several decades. I could write a converter for each of the old formats, but I'd rather write a single program that works directly from the source data.

Thanks,

--jh--

Joe
  • 9
  • 2
  • This sounds like a one-off project, so if no one offers the solution your looking for, consider to load it to a relational DB and rely on it to normailze your data. Just an idea. Good luck! – shellter Jul 12 '20 at 18:04
  • Each flagged option (e.g. '--cols') is processed independently. An 'append' `action` class can be used to collect the inputs from separate 'cols' into one list. Those could be strings, or lists depending on the `nargs`. If you want to write your own `Action` subclass, look at how the existing subclasses are defined. For example the 'append' one does fetch existing values from the namespace. Over the years I and others on SO have suggested `Action` subclasses. Also keep in mind that you have explain this to your users (or yourself 6 mths from now). – hpaulj Jul 12 '20 at 19:17
  • 1
    The program is a one-off, but the problem of stateful command lines isn't. Plenty of programs have them, but in days of searching, I have not seen a Python example of parsing one. The append action does not record the state of other options. The difficulty is associating the most recent --cols and --date_format with each file as the file is appended. Explaining it is easy: the most recent --cols and --date_format apply for each file encountered. Reading the command line left to right, those settings remain in effect until changed. – Joe Jul 12 '20 at 20:24
  • The argparse parsing model treats flagged arguments as independent and unordered. Treating them as a sequence of commands or states goes against that design. – hpaulj Jul 12 '20 at 20:44
  • It's a feature that it can take them in any order, but that doesn't make it a fundamental design criterion that it can't or shouldn't be made to treat them in the order received. There are many Unix commands that use ordered args (I cited ffmpeg), and I don't think you can find a command-line way to accomplish my goal without ordered args in a way that's easier for the user, since the files are ordered (but I'm happy to be proven wrong). Argparse is presented as a general argument parser, so it should handle this, but I'm happy to try a different package. I haven't found one, though. – Joe Jul 13 '20 at 00:41
  • The rm and touch commands use a stateful arg list, as well, with their '--' argument for dealing with filenames beginning with a '-'. Added to the top. – Joe Jul 13 '20 at 00:54
  • Step through manually and split the argument list into your nested arrangement. You can then run argparse with all its features and stateless design on each sublist independently. This seems extremely straightforward and mostly supported out of the box. – Mad Physicist Jul 13 '20 at 00:57

1 Answers1

0

One of the big things that argparse adds, compared to earlier optparse and getopt is the ability to handle positionals. It uses a re like syntax and pattern matching to allocate strings (from the sys.argv list) to positionals and to optionals (flagged) arguments.

The basic parsing routine is to alternately parse positionals and an optional.

With:

--cols 1,2,4 file1.csv --date_format "%Y-%m-%d" file2.csv --cols 4,3,9 file3.csv file4.csv

I can imagine defining a

parser.add_argument('--cols', nargs='+', action='append')
parser.add_argument('--date_format', nargs='+', action='append')

resulting in

args.cols = [['1,2,4','file1.csv'], ['4,3,9', 'file3.csv', 'file4.csv']]
args.date_format = [["%Y-%m-%d", 'file2.csv']]

argparse does not retain an info on how the cols and date options are interleaved.

I was tempted to collect the 'file' names in positionals, but there isn't a way of ordering successive positionals between each optional.

In a recent previous SO I suggested prepopulating args with lists, e.g.

 argparse.Namespace(cols=[[]], date_format=[["%m/%d/%Y"]])

and changing the cols action to replace the last empty list. An new date_format would update both cols and date_format to start a new "state".

Python using diferent options multiple times with argparse

In the default Action subclasses, the __call__ writes the new value(s) to the attribute (with setattr), overwriting the default or what ever was written before. append subclass, fetches the attribute (getattr), appends to it and writes it back. The default classes only work with their own dest.

The only "state" that the Action has access to is the namespace. But that's probably enough it you design the custom action subclasses to fetch and save the appropriate attributes. Custom actions can even write and read attributes that aren't spelled out in the add_argument calls. (In the documentation, set_defaults is used to add function attribute for subparsers.)

Another customization approach is to define a new Namespace class. The default one is simple, with just a means displaying itself. Where possible argparse uses getattr, hasattr and setattr to interact with the namespace, so it imposes minimal constraints on that class.

So between type functions, action subclasses, namespace classes, and formatter there's a lot of room for customizing argparse. But you do need to study the argparse.py code. And recognize that there's little you can do to change the basic parsing sequence.

Processing sys.argv before parsing is another tool, as is post processing the args namespace.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks, this is very helpful, and I appreciate all the time and thought you've put into this. I have spent some time studying the argparse.py code and your posts. What hampers me is my own inexperience in writing Python classes. That's why I asked here. I can see in principle that an action AppendState could be written that stores, say, positional args while recording the state of a list of other args, producing the list of lists I mentioned. It's probably only a few lines. I think it would make a great addition to argparse.py. I'll give it a shot, but can't in the next couple of days. – Joe Jul 14 '20 at 02:26