40

I'm trying to figure out why my unit test fails (The third assert below):

var date = new DateTime(2017, 1, 1, 1, 0, 0);

var formatted = "{countdown|" + date.ToString("o") + "}";

//Works
Assert.AreEqual(date.ToString("o"), $"{date:o}");
//Works
Assert.AreEqual(formatted, $"{{countdown|{date.ToString("o")}}}");
//This one fails
Assert.AreEqual(formatted, $"{{countdown|{date:o}}}");

AFAIK, this should work correctly, but it appears that it doesn't pass the formatting parameter in correctly, it it appears as just {countdown|o} to the code. Any idea why this is failing?

Blue
  • 22,608
  • 7
  • 62
  • 92
  • It appears (though I hate saying it) that it's a compiler bug. – DavidG Feb 09 '17 at 16:44
  • 5
    @DavidG: Could be a compiler bug, or could be a bug in the underlying formatting library, but I agree that something smells bad here. It at least should be investigated. – Eric Lippert Feb 09 '17 at 16:45
  • It seems to be to do with the way the closing interpolation bracket is evaulated. With the code above the outer bracket closes the interpolation `{{countdown|**{**date:o}}**}**`, a space between the brackets causes it to evaluate to the inner bracket `{{countdown|**{**date:o**}**_}}`. – Equalsk Feb 09 '17 at 16:48
  • Note the issue is not due to string interpolation, it's inside `string.Format` somewhere (e.g. `string.Format("{{{0:o}}}", date)`) – DavidG Feb 09 '17 at 16:49
  • I think you ment `Assert.AreEqual(formatted, $"{{{$"countdown|{date:o}"}}}");` – Mikko Viitala Feb 09 '17 at 16:52
  • 1
    It looks like `o` is interpreted as a part of custom date time format. And since it isn't valid format specifier it's just copied to output. See (Custom Date and Time Format Strings documentation page](https://msdn.microsoft.com/en-us/library/8kb3ddd4(v=vs.110).aspx#sSpecifier). – Leonid Vasilev Feb 09 '17 at 16:52
  • @LeonidVasilyev `o` is a standard format string though https://msdn.microsoft.com/en-us/library/az4se3k1(v=vs.110).aspx#Roundtrip – DavidG Feb 09 '17 at 16:54
  • this should work `$"{{countdown|{date:o}"+"}";` it can not manage the closing curl and the double curl escape - it starts escaping from left to right –  Feb 09 '17 at 16:55
  • @DavidG That is correct, but it looks like it is interpreted as custom. – Leonid Vasilev Feb 09 '17 at 16:55
  • @FrankerZ Your bounty says this is a bug, but I don't think it is, it's just a symptom of how the braces are counted up and the answer from user1892538 demonstrates this perfectly. – DavidG Feb 12 '17 at 18:32
  • My bounty says it *may* be a bug. – Blue Feb 13 '17 at 13:35

4 Answers4

22

The problem with this line

Assert.AreEqual(formatted, $"{{countdown|{date:o}}}");

is that you have 3 curly quotes after the format string of the variable to be escaped and it starts escaping from left to right, therefore it treats the first 2 curly quotes as part of the format string and the third curly quote as the closing one.

So it transforms o in o} and the it's unable to interpolate it.

This should work

Assert.AreEqual(formatted, $"{{countdown|{date:o}"+"}");

Notice that the simpler $"{date}}}" (i.e. 3 curls after the variable without a format string) does work because it recognizes that the first curly quote is the closing one, while the interpretation of the format specifier after the : breaks the correct closing parenthesis identification.

To prove that the format string is escaped like a string, consider that the following

$"{date:\x6f}"

is treated as

$"{date:o}"

Finally, it is perfectly possible that the double escaped curly quotes are part of a custom date format, so it is absolutely reasonable the behaviour of the compiler. Again, a concrete example

$"{date:MMM}}dd}}yyy}" // it's a valid feb}09}2017

Parsing is a formal process based on expression grammar rules, can't be done by just glancing at it.

  • The number of braces is not an issue as `$"{{{date:o}}}"` still produces invalid output. – DavidG Feb 09 '17 at 17:02
  • 1
    @DavidG *3 curly quotes* after the format string `o` *do* produce invalid output, as I have written in my answer - that is now **correct** –  Feb 09 '17 at 18:13
  • 2
    Nice analysis. I am surprised to learn that the lexer treats the `}}}` which follows the format specifier as an escaped `}` followed by a meaningful `}`. I would have naively expected the rule to be "once you find a meaningful `{`, parse formatted expressions until you find the matching `}`, and then resume the regular string lexing. Next time I see Neal I'll ask what led to this somewhat surprising result. – Eric Lippert Feb 10 '17 at 00:22
  • Another example is `Console.WriteLine(" {0} is rendered as {0:000}}000.000}",2000.4);` where the output is `2000,4 is rendered as 002}000,400` and so, once it finds a meaningful `{`, it parses untill it finds the matching `}` or a `:` and in the latter case it resume the `}` escaping –  Feb 10 '17 at 02:46
  • @user1892538 Is this the intended behavior? – Blue Feb 10 '17 at 11:51
  • @FrankerZ yep, but in either case... just let me quote this [comment](http://stackoverflow.com/questions/7114619/c-sharp-string-format-with-curly-bracket-in-string#comment8524941_7114654) of about 6 years ago... –  Feb 10 '17 at 13:02
  • @user1892538 Very nice observation. I am surprised why this is not accepted as answer yet.. – Chetan Feb 15 '17 at 13:25
6

This is a follow-up to my original answer in order

to make sure this is the intended behavior

As far as an official source is concerned, we should refer to the Interpolated Strings from msdn.

The structure of an interpolated string is

$ " <text> { <interpolation-expression> <optional-comma-field-width> <optional-colon-format> } <text> ... } "  

and each single interpolation is formally defined with a syntax

single-interpolation:  
    interpolation-start  
    interpolation-start : regular-string-literal  

interpolation-start:  
    expression  
    expression , expression  

What counts here is that

  1. the optional-colon-format is defined as a regular-string-literal syntax => i.e. it can contains an escape-sequence, according to the paragraph 2.4.4.5 String literals of the C# Language Specification 5.0
  2. You can use an interpolated string anywhere you can use a string literal
  3. To include a curly brace ({ or }) in an interpolated string use two curly braces, {{ or }} => i.e. the compiler escapes two curly braces in the optional-colon-format
  4. the compiler scans the contained interpolation expressions as balanced text until it finds a comma, colon, or close curly brace => i.e. a colon breaks the balanced text as well as a close curly brace

Just to be clear, this explains the difference between $"{{{date}}}" where date is an expression and so it is tokenized until the first curly brace versus $"{{{date:o}}}" where date is again an expression and now it is tokenized until the first colon, after which a regular string literal begins and the compiler resumes escaping two curly braces, etc...

There is also the String Formatting FAQ from msdn, where this case was explicitly treated.

int i = 42;
string s = String.Format(“{{{0:N}}}”, i);   //prints ‘{N}’

The question is, why did this last attempt fail? There’s two things you need to know in order to understand this result:

When providing a format specifier, string formatting takes these steps:

Determine if the specifier is longer than a single character: if so, then assume that the specifier is a custom format. A custom format will use suitable replacements for your format, but if it doesn’t know what to do with some character, it will simply write it out as a literal found in the format Determine if the single character specifier is a supported specifier (such as ‘N’ for number formatting). If it is, then format appropriately. If not, throw an ArgumnetException

When attempting to determine whether a curly bracket should be escaped, the curly brackets are simply treated in the order they are received. Therefore, {{{ will escape the first two characters and print the literal {, and the the third curly bracket will begin the formatting section. On this basis, in }}} the first two curly brackets will be escaped, therefore a literal } will be written to the format string, and then the last curly bracket will be assumed to be ending a formatting section With this information, we now can figure out what’s occurring in our {{{0:N}}} situation. The first two curly brackets are escaped, and then we have a formatting section. However, we then also escape the closing curly bracket, before closing the formatting section. Therefore, our formatting section is actually interpreted as containing 0:N}. Now, the formatter looks at the format specifier and it sees N} for the specifier. It therefore interprets this as a custom format, and since neither N or } mean anything for a custom numeric format, these characters are simply written out, rather than the value of the variable referenced.

2

Problem seems to be that to insert a parenthesis while using string interpolation you you need to escape it by duplicating it. If you add the parenthesis used for the interpolation itself, we end up with a triple parenthesis such as the one you have in the line that gives you the exception:

Assert.AreEqual(formatted, $"{{countdown|{date:o}}}");

Now, if we observe the "}}}", we can notice that the first parenthesis encloses the string interpolation, while the final two are meant to be treated as a string-escaped parenthesis character.

The compiler however, is treating the first two as the scaped string character, thus it's inserting a string between the interpolation delimiters. Basically the compiler is doing something like this:

string str = "a string";
$"{str'}'}"; //this would obviously generate a compile error which is bypassed by this bug

You can resolve this by reformatting the line as such:

Assert.AreEqual(formatted, $"{{countdown|{$"{date:o}"}}}");
Innat3
  • 3,561
  • 2
  • 11
  • 29
1

This is the easiest way to get the assert to work...

Assert.AreEqual(formatted, "{" + $"countdown|{date:o}" + "}");

In this form...

Assert.AreEqual(formatted, $"{{countdown|{date:o}}}");

The first 2 closing braces are interpreted as a literal closing brace and the third as closing the formatting expression.

This is not a bug so much as a limitation of the grammar for interpolated strings. The bug, if there is one, is that the output of the formatted text should probably be "o}" instead of just "o".

The reason we have the operator "+=" instead of "=+" in C, C#, and C++ is that in the form =+ you cannot tell in some cases whether the "+" is part of the operator or a unary "+".

AQuirky
  • 4,691
  • 2
  • 32
  • 51