The rules
The rules for parsing escapes are documented here: https://msdn.microsoft.com/en-us/library/17w5ykft.aspx
Microsoft C/C++ startup code uses the following rules when interpreting arguments given on the operating system command line:
Arguments are delimited by white space, which is either a space or a tab.
The caret character (^) is not recognized as an escape character or delimiter. The character is handled completely by the command-line parser in the operating system before being passed to the argv array in the program.
A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
- A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (").
- Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
- If an even number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is interpreted as a string delimiter.
- If an odd number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is "escaped" by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.
Application to generation
Unfortunately there is no good documentation on how to properly escape arguments, i.e. how to apply the above rules to ensure that an array of arguments is passed correctly to the target application. Here are the rules I followed for escaping each argument:
If the argument contains a space or tab, wrap it in " (double quote) characters.
If the argument contains a " (double quote), preceded by \ (backslash) characters, escape the preceding \ (backslash) characters with \ (backslash) before appending the escaped " (double quote).
If the argument ends with one or more \ (backslash), and contains white space, escape the final \ (backslash) characters with \ (backslash) before adding the enclosing " (double quote).
The code
/// <summary>
/// Convert an argument array to an argument string for using
/// with Process.StartInfo.Arguments.
/// </summary>
/// <param name="argument">
/// The args to convert.
/// </param>
/// <returns>
/// The argument <see cref="string"/>.
/// </returns>
public static string EscapeArguments(string argument)
{
using (var characterEnumerator = argument.GetEnumerator())
{
var escapedArgument = new StringBuilder();
var backslashCount = 0;
var needsQuotes = false;
while (characterEnumerator.MoveNext())
{
switch (characterEnumerator.Current)
{
case '\\':
// Backslashes are simply passed through, except when they need
// to be escaped when followed by a \", e.g. the argument string
// \", which would be encoded to \\\"
backslashCount++;
escapedArgument.Append('\\');
break;
case '\"':
// Escape any preceding backslashes
for (var c = 0; c < backslashCount; c++)
{
escapedArgument.Append('\\');
}
// Append an escaped double quote.
escapedArgument.Append("\\\"");
// Reset the backslash counter.
backslashCount = 0;
break;
case ' ':
case '\t':
// White spaces are escaped by surrounding the entire string with
// double quotes, which should be done at the end to prevent
// multiple wrappings.
needsQuotes = true;
// Append the whitespace
escapedArgument.Append(characterEnumerator.Current);
// Reset the backslash counter.
backslashCount = 0;
break;
default:
// Reset the backslash counter.
backslashCount = 0;
// Append the current character
escapedArgument.Append(characterEnumerator.Current);
break;
}
}
// No need to wrap in quotes
if (!needsQuotes)
{
return escapedArgument.ToString();
}
// Prepend the "
escapedArgument.Insert(0, '"');
// Escape any preceding backslashes before appending the "
for (var c = 0; c < backslashCount; c++)
{
escapedArgument.Append('\\');
}
// Append the final "
escapedArgument.Append('\"');
return escapedArgument.ToString();
}
}
/// <summary>
/// Convert an argument array to an argument string for using
/// with Process.StartInfo.Arguments.
/// </summary>
/// <param name="args">
/// The args to convert.
/// </param>
/// <returns>
/// The argument <see cref="string"/>.
/// </returns>
public static string EscapeArguments(params string[] args)
{
var argEnumerator = args.GetEnumerator();
var arguments = new StringBuilder();
if (!argEnumerator.MoveNext())
{
return string.Empty;
}
arguments.Append(EscapeArguments((string)argEnumerator.Current));
while (argEnumerator.MoveNext())
{
arguments.Append(' ');
arguments.Append(EscapeArguments((string)argEnumerator.Current));
}
return arguments.ToString();
}
Test Cases
Here are the test cases I used in verifying the above code (the harness is left as an exercise for the reader)
NOTE: My test case was to take a random number of the below cases as an input args array, encode it into an argument string, pass the string to a new process which output the arguments as a JSON array, and verify that the input args array matches the output JSON array.
+---------------------------------------+--------------------------------------------+
| Input String | Escaped String |
+---------------------------------------+--------------------------------------------+
| quoted argument | "quoted argument" |
| "quote | \"quote |
| "wrappedQuote" | \"wrappedQuote\" |
| "quoted wrapped quote" | "\"quoted wrapped quote\"" |
| \backslashLiteral | \backslashLiteral |
| \\doubleBackslashLiteral | \\doubleBackslashLiteral |
| trailingBackslash\ | trailingBackslash\ |
| doubleTrailingBackslash\\ | doubleTrailingBackslash\\ |
| \ quoted backslash literal | "\ quoted backslash literal" |
| \\ quoted double backslash literal | "\\ quoted double backslash literal" |
| quoted trailing backslash\ | "quoted trailing backslash\\" |
| quoted double trailing backslash\\ | "quoted double trailing backslash\\\\" |
| \"\backslashQuoteEscaping | "\\\"\backslashQuoteEscaping " |
| \\"\doubleBackslashQuoteEscaping | "\\\\\"\doubleBackslashQuoteEscaping " |
| \\"\\doubleBackslashQuoteEscaping | "\\\\\"\\doubleBackslashQuoteEscaping " |
| \"\\doubleBackslashQuoteEscaping | "\\\"\\doubleBackslashQuoteEscaping " |
| \"\backslash quote escaping | "\\\"\backslash quote escaping " |
| \\"\double backslash quote escaping | "\\\\\"\double backslash quote escaping " |
| \\"\\double backslash quote escaping | "\\\\\"\\double backslash quote escaping " |
| \"\\double backslash quote escaping | "\\\"\\double backslash quote escaping " |
| TrailingQuoteEscaping" | TrailingQuoteEscaping\" |
| TrailingQuoteEscaping\" | TrailingQuoteEscaping\\\" |
| TrailingQuoteEscaping\"\ | TrailingQuoteEscaping\\\"\ |
| TrailingQuoteEscaping"\ | TrailingQuoteEscaping\"\ |
| Trailing Quote Escaping" | "Trailing Quote Escaping\"" |
| Trailing Quote Escaping\" | "Trailing Quote Escaping\\\"" |
| Trailing Quote Escaping\"\ | "Trailing Quote Escaping\\\"\\" |
| Trailing Quote Escaping"\ | "Trailing Quote Escaping\"\\" |
+---------------------------------------+--------------------------------------------+
There are other answers to this question here on SO. I simply prefer a coded state machine to regular expressions (also, it runs faster).
https://stackoverflow.com/a/6040946/3591916 has a good explanation of how to do it.