34

Similar to this thread for C#, I need to split a string containing the command line arguments to my program so I can allow users to easily run multiple commands. For example, I might have the following string:

-p /path -d "here's my description" --verbose other args

Given the above, Java would normally pass the following in to main:

Array[0] = -p
Array[1] = /path
Array[2] = -d
Array[3] = here's my description
Array[4] = --verbose
Array[5] = other
Array[6] = args

I don't need to worry about any shell expansion, but it must be smart enough to handle single and double quotes and any escapes that may be present within the string. Does anybody know of a way to parse the string as the shell would under these conditions?

NOTE: I do NOT need to do command line parsing, I'm already using joptsimple to do that. Rather, I want to make my program easily scriptable. For example, I want the user to be able to place within a single file a set of commands that each of which would be valid on the command line. For example, they might type the following into a file:

--addUser admin --password Admin --roles administrator,editor,reviewer,auditor
--addUser editor --password Editor --roles editor
--addUser reviewer --password Reviewer --roles reviewer
--addUser auditor --password Auditor --roles auditor

Then the user would run my admin tool as follows:

adminTool --script /path/to/above/file

main() will then find the --script option and iterate over the different lines in the file, splitting each line into an array that I would then fire back at a joptsimple instance which would then be passed into my application driver.

joptsimple comes with a Parser that has a parse method, but it only supports a String array. Similarly, the GetOpt constructors also require a String[] -- hence the need for a parser.

Community
  • 1
  • 1
Kaleb Pederson
  • 45,767
  • 19
  • 102
  • 147
  • 3
    Couldn't you just use the args array given to you in main() instead of trying to parse it yourself? – Jeff Mercado Jul 15 '10 at 19:24
  • I've updated my question to describe why I need to parse the string and how that's different from command line parsing. – Kaleb Pederson Jul 15 '10 at 19:42
  • I don't think it is any different than command line parsing, see the addendum to my answer on how I have approached something very similar to this in the past. –  Jul 15 '10 at 19:55
  • just added a short answer that you might find useful - now that you've added some explanaitions to your question :-) – Andreas Dolk Jul 15 '10 at 21:11

6 Answers6

32

Here is a pretty easy alternative for splitting a text line from a file into an argument vector so that you can feed it into your options parser:

This is the solution:

public static void main(String[] args) {
    String myArgs[] = Commandline.translateCommandline("-a hello -b world -c \"Hello world\"");
    for (String arg:myArgs)
        System.out.println(arg);
}

The magic class Commandline is part of ant. So you either have to put ant on the classpath or just take the Commandline class as the used method is static.

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • 1
    As documentation, `translateCommandline` handles both single and double quoted strings and escapes within them but does not recognize backslash in the same way as a POSIX shell because of problems that causes on DOS based systems. – Kaleb Pederson Jul 15 '10 at 22:46
  • There is a source distribution of ant. At this point I'd take the implementation of `translateCommandline` and modify it to fit my needs. – Andreas Dolk Jul 16 '10 at 06:07
  • 1
    Careful, \t\r\n are not white spaces for this method – basin May 27 '14 at 10:48
  • 4
    Still the only way ? Anything in the core libraries ? – Mr_and_Mrs_D Jul 06 '14 at 21:33
  • 4
    Implementation (line 337): [translateCommandline](https://commons.apache.org/proper/commons-exec/apidocs/src-html/org/apache/commons/exec/CommandLine.html) – Mr_and_Mrs_D Jul 07 '14 at 13:51
10

If you need to support only UNIX-like OSes, there is an even better solution. Unlike Commandline from ant, ArgumentTokenizer from DrJava is more sh-like: it supports escapes!

Seriously, even something insane like sh -c 'echo "\"un'\''kno\"wn\$\$\$'\'' with \$\"\$\$. \"zzz\""' gets properly tokenized into [bash, -c, echo "\"un'kno\"wn\$\$\$' with \$\"\$\$. \"zzz\""] (By the way, when run, this command outputs "un'kno"wn$$$' with $"$$. "zzz").

nvamelichev
  • 398
  • 3
  • 13
8

You should use a fully featured modern object oriented Command Line Argument Parser I suggest my favorite Java Simple Argument Parser. And how to use JSAP, this is using Groovy as an example, but it is the same for straight Java. There is also args4j which is in some ways more modern than JSAP because it uses annotations, stay away from the apache.commons.cli stuff, it is old and busted and very procedural and un-Java-eques in its API. But I still fall back on JSAP because it is so easy to build your own custom argument handlers.

There are lots of default Parsers for URLs, Numbers, InetAddress, Color, Date, File, Class, and it is super easy to add your own.

For example here is a handler to map args to Enums:

import com.martiansoftware.jsap.ParseException;
import com.martiansoftware.jsap.PropertyStringParser;

/*
This is a StringParser implementation that maps a String to an Enum instance using Enum.valueOf()
 */
public class EnumStringParser extends PropertyStringParser
{
    public Object parse(final String s) throws ParseException
    {
        try
        {
            final Class klass = Class.forName(super.getProperty("klass"));
            return Enum.valueOf(klass, s.toUpperCase());
        }
        catch (ClassNotFoundException e)
        {
            throw new ParseException(super.getProperty("klass") + " could not be found on the classpath");
        }
    }
}

and I am not a fan of configuration programming via XML, but JSAP has a really nice way to declare options and settings outside your code, so your code isn't littered with hundreds of lines of setup that clutters and obscures the real functional code, see my link on how to use JSAP for an example, less code than any of the other libraries I have tried.

This is a direction solution to your problem as clarified in your update, the lines in your "script" file are still command lines. Read them in from the file line by line and call JSAP.parse(String);.

I use this technique to provide "command line" functionality to web apps all the time. One particular use was in a Massively Multiplayer Online Game with a Director/Flash front end that we enabled executing "commands" from the chat like and used JSAP on the back end to parse them and execute code based on what it parsed. Very much like what you are wanting to do, except you read the "commands" from a file instead of a socket. I would ditch joptsimple and just use JSAP, you will really get spoiled by its powerful extensibility.

  • JSAP is the first parser I've seen to accept a string but, unfortunately, it returns a `JSAPResult` rather than a `String[]`, so I won't be able to use it without switching my command line parsing library :(. – Kaleb Pederson Jul 15 '10 at 19:45
  • a `String[]` is pretty useless, the entire reason for JSAP result is it does all the parsing and rules enforcement and checking for you. I think if you really step back from where you are some rethinking of your approach and some refactoring will really be beneficial. See my update based on your last edit. –  Jul 15 '10 at 19:54
  • I don't want to build a shell string parser. `line.split(" ")` isn't nearly intelligent enough. It would die on the parameter that creates `Array[3]` as I indicated in my post as parameters may have both spaces and escape sequences within them. I need a full parser to handle all the possibilities -- but I need a string to String[] parser, rather than a command line parser. – Kaleb Pederson Jul 15 '10 at 20:04
  • 1
    JSAP might take a couple of goes at reading through the docs to understand the options it supplies, but it's a very good solution for command-line parsing requirements and works well - definitely recommended... – Gwyn Evans Jul 15 '10 at 20:04
  • maybe switching is the best thing you can do joptsimple is probably too "simple" for your requirements. –  Jul 15 '10 at 20:05
  • A `String[]` is extremely useful to me as I already have a `joptsimple` command line parser that handles all the commands that would be issued within my file. I need only to convert the string into a `String[]` in order to be able to provide easy scripting of multiple commands. – Kaleb Pederson Jul 15 '10 at 20:05
  • JSAP's `CommandLineTokenizer` is very close to what I need (and may be sufficient). It parses the string like Windows 2000 instead of like a Unix shell. – Kaleb Pederson Jul 15 '10 at 20:18
  • Switching that little bit of code to JSAP will be much less work than writing an ANTLR grammar to parse a UNIX style command line :-) –  Jul 15 '10 at 20:18
6
/**
 * [code borrowed from ant.jar]
 * Crack a command line.
 * @param toProcess the command line to process.
 * @return the command line broken into strings.
 * An empty or null toProcess parameter results in a zero sized array.
 */
public static String[] translateCommandline(String toProcess) {
    if (toProcess == null || toProcess.length() == 0) {
        //no command? no string
        return new String[0];
    }
    // parse with a simple finite state machine

    final int normal = 0;
    final int inQuote = 1;
    final int inDoubleQuote = 2;
    int state = normal;
    final StringTokenizer tok = new StringTokenizer(toProcess, "\"\' ", true);
    final ArrayList<String> result = new ArrayList<String>();
    final StringBuilder current = new StringBuilder();
    boolean lastTokenHasBeenQuoted = false;

    while (tok.hasMoreTokens()) {
        String nextTok = tok.nextToken();
        switch (state) {
        case inQuote:
            if ("\'".equals(nextTok)) {
                lastTokenHasBeenQuoted = true;
                state = normal;
            } else {
                current.append(nextTok);
            }
            break;
        case inDoubleQuote:
            if ("\"".equals(nextTok)) {
                lastTokenHasBeenQuoted = true;
                state = normal;
            } else {
                current.append(nextTok);
            }
            break;
        default:
            if ("\'".equals(nextTok)) {
                state = inQuote;
            } else if ("\"".equals(nextTok)) {
                state = inDoubleQuote;
            } else if (" ".equals(nextTok)) {
                if (lastTokenHasBeenQuoted || current.length() != 0) {
                    result.add(current.toString());
                    current.setLength(0);
                }
            } else {
                current.append(nextTok);
            }
            lastTokenHasBeenQuoted = false;
            break;
        }
    }
    if (lastTokenHasBeenQuoted || current.length() != 0) {
        result.add(current.toString());
    }
    if (state == inQuote || state == inDoubleQuote) {
        throw new RuntimeException("unbalanced quotes in " + toProcess);
    }
    return result.toArray(new String[result.size()]);
}
qiangbro
  • 128
  • 1
  • 9
3

Expanding on Andreas_D's answer, instead of copying, use CommandLineUtils.translateCommandline(String toProcess) from the excellent Plexus Common Utilities library.

Community
  • 1
  • 1
Sagi
  • 592
  • 5
  • 15
-2

I use the Java Getopt port to do it.

Andy
  • 13,916
  • 1
  • 36
  • 78