5

I would like to split up a string using a space as my delimiter, but if there are multiple words enclosed in double or single quotes, then I would like them to be returned as one item.

For example if the input string is:

CALL "C:\My File Name With Space" /P1 P1Value /P1 P2Value

The output array would be:

Array[0]=Call
Array[1]=C:\My File Name With Space
Array[2]=/P1
Array[3]=P1Value
Array[4]=/P1
Array[5]=P2Value

How do you use regular expressions to do this? I realize that there are command line parsers. I took a cursory look at a popular one, but it did not handle the situation where you can have multiple parameters with the same name. In any event, instead of learning how to use a command line parsing library (leave that for another day). I'm interested in getting exposed more to RegEx functions.

How would you use a RegEx function to parse this?

Chad
  • 23,658
  • 51
  • 191
  • 321
  • 1
    Is it not the case that you are given command line arguments as an array of strings in Main()? – i_am_jorf Jun 11 '13 at 18:55
  • No, I am parsing batch files in a folder. – Chad Jun 11 '13 at 18:57
  • 1
    I wouldn't use a regular expression to handle this. There are just too many special cases in command lines. You'd be better off using one of the recommendations from http://stackoverflow.com/questions/491595/best-way-to-parse-command-line-arguments-in-c?rq=1, or just writing your own (which would take a couple of hours, perhaps). – Jim Mischel Jun 11 '13 at 19:06
  • 1
    Actually, I think it was NDesk that didn't support multiple params with the same name ( I could be wrong.) I have a feeling RegEx can handle the 2 requirement criteria specified. That's all I'm looking for. – Chad Jun 11 '13 at 19:09
  • 2
    The problem is harder than it sounds. Parsing a Windows command line that includes quotes is pretty weird. See http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx for some examples. – Jim Mischel Jun 11 '13 at 20:15

3 Answers3

11

The link in Jim Mischel's comment points out that the Win32 API provides a function for this. I'd recommend using that for consistency. Here's a sample (from PInvoke).

static string[] SplitArgs(string unsplitArgumentLine)
{
    int numberOfArgs;
    IntPtr ptrToSplitArgs;
    string[] splitArgs;

    ptrToSplitArgs = CommandLineToArgvW(unsplitArgumentLine, out numberOfArgs);
    if (ptrToSplitArgs == IntPtr.Zero)
        throw new ArgumentException("Unable to split argument.",
          new Win32Exception());
    try
    {
        splitArgs = new string[numberOfArgs];
        for (int i = 0; i < numberOfArgs; i++)
            splitArgs[i] = Marshal.PtrToStringUni(
                Marshal.ReadIntPtr(ptrToSplitArgs, i * IntPtr.Size));
        return splitArgs;
    }
    finally
    {
        LocalFree(ptrToSplitArgs);
    }
}

[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
    [MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine,
    out int pNumArgs);

[DllImport("kernel32.dll")]
static extern IntPtr LocalFree(IntPtr hMem);

If you want a quick-and-dirty, inflexible, fragile regex solution you can do something like this:

var rex = new Regex(@"("".*?""|[^ ""]+)+");
string test = "CALL \"C:\\My File Name With Space\" /P1 P1Value /P1 P2Value";
var array = rex.Matches(test).OfType<Match>().Select(m => m.Groups[0]).ToArray();
Chad
  • 7,279
  • 2
  • 24
  • 34
  • Worked like a charm. I'm surprised to see code going outside of the framework. I feel a little dirty, not sure why, probably cause I don't understand. – Chad Jun 11 '13 at 21:35
  • sqlcmd.exe (http://msdn.microsoft.com/en-us/library/ms162773.aspx) and probably other exes allow for params switches in the form of a dash followed by a single letter to have an OPTIONAL space before writing the param value. For example "sqlcmd.exe -sMyServer" and "sqlcmd.exe -s MyServer" indicate the same passed value. However, this function passes 2 arguments for the first and 3 for the second. – Chad Jun 13 '13 at 01:55
  • @ChadD - `CommandLineToArgvW` is what the shell uses to figure out how to pass arguments. sqlcmd.exe then contains logic that interprets them. `-s MyServer` is passed as two args, but sqlcmd.exe recognizes them as one option together. – Chad Jun 13 '13 at 13:34
  • The CommandLineToArgvW solution doesn't work as it doesn't respect special cases like \\ and \" – Wouter Schut Nov 18 '16 at 15:56
2

I wouldn't do it with Regex, for various reasons shown above.

If I did need to, this would match your simple requirements:

(".*?")|([^ ]+)

However, this doesn't include:

  • Escaped quotes
  • Single quotes
  • non-ascii quotes (you don't think people will paste smart quotes from word into your file?)
  • combinations of the above

And that's just off the top of my head.

jedigo
  • 901
  • 8
  • 12
1

@chad Henderson you forgot to include the single quotes, and this also have the problem of capturing anything that comes before a set of quotes.

here is the correction including the single quotes, but also shows the problem with the extra capture before a quote. http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348

Bruce Burge
  • 144
  • 1
  • 15
  • Windows actually doesn't treat single quotes the same way it does double quotes. And you're not making sure the types of quotes match in your regex :). Just for fun, I updated mine to support args of the form `a"b c"d` – Chad Jun 11 '13 at 20:58
  • I'm curious about what the way windows treats single quotes has to do with this? – Bruce Burge Jun 11 '13 at 21:02
  • Windows treats `'a b'` as two separate arguments, `'a` and `b'` – Chad Jun 11 '13 at 21:04