1

We have a bunch of shell scripts with multiple calls to a specific tool with its respective command line arguments, e.g:

some_tool -a param -b param -c param,param,param
some_tool -d param -f param
some_other_tool -x another_param -Y -z params,params,params
etc.

How can text files containing these calls be parsed and processed cleanly in Python? Is there a library that is intended specifically to parse Unix-like command line invocations? I'm thinking of shlex but this seems to only address a part of it (things like quoted arguments).

NOTE: I'm not interested in providing a CLI to the tool that will process the files, so argparse and the like are not what I'm looking for.

Nobilis
  • 7,310
  • 1
  • 33
  • 67
  • Why are you not looking for `argparse`? You can use `argparse` to parse arguments without *using* those arguments. That's unless you want this to be generic and not require you teaching argparse what arguments to expect, of course (just asking for clarification) : ) – Thomas Orozco Oct 05 '15 at 09:52
  • I thought argparse collects arguments passed to its parent tool (presumably from `sys.argv`) and is not concerned with parsing arbitrary strings – Nobilis Oct 05 '15 at 09:53
  • You can actually use `argparse` with arbitrary arguments using `parser.parse_args(["my", "--arguments", "go", here"])`. You still need to instantiate the parser and tell it about what options to expect, though. You probably also want to subclass the parser so it doesn't output help and exit the program on parse error. – Thomas Orozco Oct 05 '15 at 09:54
  • Is there a way to do that without telling the parser what to expect? Perhaps something that neatly splits a string (e.g. `-t arg`), works out what is a switch (`-t`) and what its argument is (`arg`)? – Nobilis Oct 05 '15 at 09:58
  • When you say "parsed and processed cleanly" ... exactly what information are you wanting to extract? – donkopotamus Oct 05 '15 at 10:13
  • @donkopotamus I guess the tool name, and the command line switches separately from their arguments. Say for `tool -a param -b param_2 -c param_3, param4` I get something like `('tool', ('-a', 'param'), ('-b', 'param_2'), ('-c', ('param_3', 'param_4')))`. I realise it's fairly unrealistic to expect something like this but I hope it gives an idea of the kind of input and respective output I'm looking for. – Nobilis Oct 05 '15 at 10:18
  • 2
    If you know the possible options for each tool, then the `argparse` route is probably best. If you don't, then it's not a well specified problem. e.g. `tool -a -b name` has different interpretations depending on whether `-a` is a switch, or an option that must have an argument (in which case `-b` was the value ...) – donkopotamus Oct 05 '15 at 10:24

1 Answers1

0

Following your comment of getting a pattern like ('tool', ('-a', 'param'), ('-b', 'param_2'), ('-c', ('param_3', 'param_4'))), it seems like you want to read the file as a collection of strings, which follow the pattern of a command in each line, and separate them into an organized list or tuple.

In that case, you could use regular expressions to help you segment each line into the sections you expect from such pattern. For example:

# Compiled regular expression for the command/tool name
regex_command = re.compile("^(\w+)", re.IGNORECASE)

# Compiled regex for -option_name params
regex_options = re.compile("[/s]*(?:-[\w]+)[\s]*[(?:\w+)[\,]*[\s]*]*[$]*",
                           re.IGNORECASE)

# This will hold the found commands/tools in each line
parsed_tools = []

# Loop through each line of the file (this may be, ie. f.readline() or other)
for line in text.split("\n"):
    # This will hold the found tool/command in the current line
    parsed_tool = []
    # Append the command/tool name found at the start of the line
    parsed_tool.append(regex_command.match(line).group(0))

    # Find the line's options and their parameters with the second regex
    options = regex_options.findall(line)

    # Loop through the found matches
    for option in options:
        # Separate the line of options and parameters by white spaces
        segments = option.split()
        # The first found group is the name of the option
        option_name = segments[0]
        # The rest may be parameters, if any
        option_params = segments[1:] if len(segments) > 1 else None

        # The parameters may be joined by commas, so attempt to separate them
        # even further; otherwise only append the option name
        parsed_tool.append((option_name,
                            tuple(str(option_params).split(",")))
                           if option_params else option_name)

    # Append each parsed_tool into the overall list
    parsed_tools.append(parsed_tool)

In the code above, I'm using compiled regular expressions from the re module, with an added parameter of not being case-sensitive, to find a match of the tool name at the very start of the line (the group() method gives me the only result I'm expecting), and another one to "find all" matches of "-option_name params", where I loop through all the possible results and divide them by spaces and commas.

You can start learning more about regular expressions here. Adjust the regular expressions to suit the patterns you expect from the file.

Community
  • 1
  • 1
  • I'm aware that I can use regular expressions which is why I emphasised that I'm looking for a **library** as regexes are dirty and messy. It looks like the most appropriate way to do it is to create separate instances of `argaparse` for each tool that I need to parse. – Nobilis Oct 05 '15 at 13:09