6

I have been working on a program that mimics a shell terminal, and I've come across an implementation issue that is harder than I anticipated. Basically, I'm trying to split arguments, much like how the shell does to pass to its executable. So, imagining an input like:

$> ./foo some arguments

One would expect the arguments passed to the program to be an array like (assuming C/C++):

char ** argv = {"foo", "some" "arguments"}

However, if the arguments were:

$> ./foo "My name is foo" bar

The array would be:

char ** argv = {"foo", "My name is foo", "bar"}

Can anyone suggest an efficient way to implement this, such that the interface is like:

vector<string> splitArgs(string allArgs); or string[] splitArgs(string allArgs);

I can, of course, simply iterate and switch between states of 'reading words'/'reading quoted text', but I feel that that's not as effective as it could be. I also toyed with the idea of regex, but I'm not familiar enough with how this is done in C++. For this project, I do have the boost libraries installed too, if that helps.

Thanks! RR

Roadrunner-EX
  • 824
  • 11
  • 23
  • 2
    "but I feel that that's not as effective as it could be"... really, you're better off just doing it and getting a working shell. Anyway - since you've asked - check http://stackoverflow.com/questions/541561/using-boost-tokenizer-escaped-list-separator-with-different-parameters for a solution using boost tokenizer. – Tony Delroy Apr 04 '11 at 04:54
  • Just step through each character and see what you have. [Here's](http://www.blackbeltcoder.com/Articles/strings/a-c-command-line-parser) how I did it in C#. I'm not sure RegEx will give you what you need here. – Jonathan Wood Apr 04 '11 at 04:58
  • Excellent, thanks guys. I think that's what I wanted to know. – Roadrunner-EX Apr 04 '11 at 05:08
  • 1
    *Implementing* it is step **2**. Step 1 is *defining* it. Check the documentation for your favorite shell to find a definition that you can work from. Things to consider: multiple kinds of quotation marks; parentheses; I/O redirection; backslashes. (Also, do you really mean to strip the first two characters off the first token? Why?) – Rob Kennedy Apr 04 '11 at 06:14
  • Migr be related: http://stackoverflow.com/q/21959706/544721 – Grzegorz Wierzowiecki Apr 30 '16 at 18:20

2 Answers2

0

I sometimes still use this plain C utility function for this. I mostly use this on embedded systems where there is a very limited standard library, so most of the code can be changed to be more efficient using standard lib controls, but the basic technique should remain the same being, mark the quoted parts of the string prior to parsing, then just break up the string in separate tokens by splitting on the markers, and finally eliminate the quotes from the individual parts.

/**
 * Split a line into separate words.
 */
static void splitLine(char *pLine, char **pArgs) {
    char *pTmp = strchr(pLine, ' ');

    if (pTmp) {
        *pTmp = '\0';
        pTmp++;
        while ((*pTmp) && (*pTmp == ' ')) {
            pTmp++;
        }
        if (*pTmp == '\0') {
            pTmp = NULL;
        }
    }
    *pArgs = pTmp;
}



/**
 * Breaks up a line into multiple arguments.
 *
 * @param io_pLine Line to be broken up.
 * @param o_pArgc Number of components found.
 * @param io_pargc Array of individual components
 */
static void parseArguments(char *io_pLine, int *o_pArgc, char **o_pArgv) {
    char *pNext = io_pLine;
    size_t i;
    int j;
    int quoted = 0;
    size_t len = strlen(io_pLine);

    // Protect spaces inside quotes, but lose the quotes
    for(i = 0; i < len; i++) {
        if ((!quoted) && ('"' == io_pLine[i])) {
            quoted = 1;
            io_pLine[i] = ' ';
        } else if ((quoted) && ('"' == io_pLine[i])) {
            quoted = 0;
            io_pLine[i] = ' ';
        } else if ((quoted) && (' ' == io_pLine[i])) {
            io_pLine[i] = '\1';
        }
    }

    // init
    MY_memset(o_pArgv, 0x00, sizeof(char*) * C_MAXARGS);
    *o_pArgc = 1;
    o_pArgv[0] = io_pLine;

    while ((NULL != pNext) && (*o_pArgc < C_MAXARGS)) {
        splitLine(pNext, &(o_pArgv[*o_pArgc]));
        pNext = o_pArgv[*o_pArgc];

        if (NULL != o_pArgv[*o_pArgc]) {
            *o_pArgc += 1;
        }
    }

    for(j = 0; j < *o_pArgc; j++) {
        len = strlen(o_pArgv[j]);
        for(i = 0; i < len; i++) {
            if('\1' == o_pArgv[j][i]) {
                o_pArgv[j][i] = ' ';
            }
        }
    }
}
Pieter-Bas
  • 354
  • 1
  • 3
-3

Just passing the whole string to the shell might suit your needs:

eg:

System("./foo some arguments");

This isn't the best solution though.

The better way seems to be write a parser to find each argument and pass it to a exec style function.

mikek3332002
  • 3,546
  • 4
  • 37
  • 47