2

Is there a good (ideally CPAN) way to process arbitrary command line options with Perl?

E.g., take a string "-a 1 -b 'z -x' -c -d 3 4" and generate a GetOpt::Long - like data structure:

 { a=>1, b=>"z -x", c=>1, d=>[3,4] }  # d=>"3 4" is acceptable

The caveat is that

  1. the set of options is NOT known in advance; so we seemingly can't use GetOpt::Long as-is.

  2. The values themselves can contain other options, so simply parsing the string for #\b+-(\S+)\b# pattern to find all the options in THAT string also seems to not be possible, and further complicated that some parameters are of =s type, some =s@, some of "-x a b c" type.

  3. Moreover, even if we could do #2, does GetOptionsFromString support correct tokenizing that respects quoted values?


NOTE: Assume for the purpose of exercise that ALL arguments are "options", in other words, if you split up the string into (possibly-quoted) tokens, your structure is always

 "-opt1 arg1a [arg1b] -opt2 ....".

In other words, any word/token that starts with a dash is a new option, and all the subsequent words/tokens that do NOT start with a dash are values for that option.

DVK
  • 126,886
  • 32
  • 213
  • 327
  • If you don't know the options in advance, how can you tell how to parse the line? If `-b` is not intended to take an option at all, then `'z -x'` starts the non-option ('file name') arguments, and `-c` and `-d` are also non-options (unless you like working without POSIXLY_CORRECT, but that just makes matters even less determinate). I really don't think that anyone can do it without knowing what's permitted. Your `-d 3 4` becomes problematic; `-d -3 -4` even more so; are the negative numbers after `-d` to be treated as options or negative integers? – Jonathan Leffler Nov 28 '12 at 20:29
  • @JonathanLeffler - the assumption is that any arguments are option arguments, of course. Otherwise it's an unsolvable problem. The naive rule is that any "token/word" starting with a dash is a new option and all the following tokens not starting with dashes are values for that option. – DVK Nov 28 '12 at 20:31
  • 1
    I note that [Getopt::Long](http://search.cpan.org/perldoc?Getopt%3A%3ALong) says: _A special entry `GetOptionsFromString` can be used to parse options from an arbitrary string. ... The contents of the string are split into arguments using a call to [Text::ParseWords](http://search.cpan.org/perldoc?Text%3A%3AParseWords)::shellwords._ I'm not sure that the description of Text::ParseWords::shellwords is reassuring. – Jonathan Leffler Nov 28 '12 at 20:36
  • I would just use Text::ParseWords and then iterate through the results – Joel Berger Nov 28 '12 at 20:38
  • You can use `Getopt::Long`; you'd call your hypothetical `Getopt::Random` module to return you an option string to be passed to `Getopt::Long`. The trickiest bit will be getting `GetOptions` to parse into appropriate variables. Given that the set of options is random, the set of variables you need to collect the values from is similarly random. So, you probably end up using refs to parts of a hash so that you can iterate over the keys of the hash to find the option names and over the corresponding value to pull the command line information. Non-trivial, but not actually impossible. I think! – Jonathan Leffler Nov 28 '12 at 20:46

2 Answers2

4

A quick example using Text::Parsewords and a simple state machine.

#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords qw/shellwords/;

my $str = q{-a 1 -b 'z -x' -c -d 3 4};
my $data = parse($str);

use Data::Printer;
p $data;

sub parse {
  my $str = shift;

  my @tokens = shellwords $str;

  my %data;
  my @keys;
  my $key = '_unknown';
  foreach my $token (@tokens) {
    if ($token =~ s/^\-//) {
      $key = $token;
      push @keys, $key;
      next;
    }

    if ( ref $data{$key} ) {
      push @{ $data{$key} }, $token;
    } elsif (defined $data{$key}) {
      $data{$key} = [ $data{$key}, $token ];
    } else {
      $data{$key} = $token;
    }
  }

  foreach my $key (@keys) {
    next if defined $data{$key};
    $data{$key} = 1;
  }

  return \%data;
}
Joel Berger
  • 20,180
  • 5
  • 49
  • 104
  • Yeah, we ended up doing something similar for now but without Text::ParseWords. I was soooo hoping for a non-self-bicycling solution :) – DVK Nov 28 '12 at 20:49
  • yeah, see there's always odd stuff, mine doesn't handle `-c` correctly. Thats why rolling by hand is dangerous (as you well know!) – Joel Berger Nov 28 '12 at 20:53
  • fixed but getting hack-y-er all the time :/ – Joel Berger Nov 28 '12 at 20:57
1

Before I knew about Getopt::Long and the wisdom of using it, I rolled my own command line options processor that would take arbitrary arguments and populate a global hashtable. The rules were

Switches with a single letter (-A .. -Z, -a .. -z)

-n               sets  $args{"n"} = 1
-nfoo            sets  $args{"n"} = "foo"

Switches with more than one letter

--foo            sets  $args{"foo"} = 1
--foo=bar        sets  $args{"foo"} = "bar"

The joy and the hurt of this approach was you could quickly experiment with new command line options, changing code at the point that the option would be used without having to edit the call to GetOptions or having to allocation another variable:

 ... line 980 ...
 if ($args{"do-experimental-thing"}) {
     # new code
     do_experimental_thing();
 } else {
     do_normal_thing();
 }

This was the first module I uploaded to CPAN and I've since removed it, but the BackPAN has a long memory.

mob
  • 117,087
  • 18
  • 149
  • 283