1

A previous C++ question asked why int (x) = 0; is allowed. However, I noticed that even int(x) = 0; is allowed, i.e. without a space before the (x). I find the latter quite strange, because it causes things like this:

using Oit = std::ostream_iterator<int>;
Oit bar(std::cout);
*bar = 6;  // * is optional
*Oit(bar) = 7;  // * is NOT optional!

where the final line is because omitting the * makes the compiler think we are declaring bar again and initializing to 7.

Am I interpreting this correctly, that int(x) = 0; is indeed equivalent to int x = 0, and Oit(bar) = 7; is indeed equivalent to Oit bar = 7;? If yes, why specifically does C++ allow omitting the space before the parentheses in such a declaration + initialization?

(my guess is because the C++ compiler does not care about any space before a left paren, since it treats that parenthesized expression as it's own "token" [excuse me if I'm butchering the terminology], i.e. in all cases, qux(baz) is equivalent to qux (baz))

xdavidliu
  • 2,411
  • 14
  • 33
  • 5
    Yeah, whitespace in C++ is in general optional everywhere, except where needed to separate two tokens. `int` and `(` are already separate tokens so the whitespace is not required. But `return x;` obviously cannot be replaced by `returnx;` because `returnx` would be a single token. – Nate Eldredge Jul 02 '22 at 19:49
  • @NateEldredge There are a few exceptions, like the opening paren in a function-like macro, or within string and character literals, or between `operator""` and a reserved word like `if`, where the space makes a difference. However, in most places the grammar doesn't require space, only the lexer to separate tokens. – user3188445 Jul 02 '22 at 19:55
  • @user3188445 In `operator""if` the `""if` part is a single token, while there are two tokens `""` and `if` if separated by a whitespace. So this still falls under the token separation exception. The same with spaces inside string/character literals. They are also single tokens. – user17732522 Jul 02 '22 at 20:25
  • @user17732522 Sort of but not exactly. After all, `operator"" _blah` (with a space) is fine, but `operator"" if` (with a space) is not. So if you view `operator""_blah` as a single token, then it's weird that you are allowed to place a space in the middle of the token without changing the semantics of the program. – user3188445 Jul 02 '22 at 21:42
  • @user3188445 `operator""_blah` are two tokens: `operator` and `""_blah`. The former is a _keyword_ and the latter is a [_user-defined-literal_](https://www.eel.is/c++draft/lex#nt:user-defined-literal). `operator"" _blah` are three tokens (`operator`/`""`/`_blah`). The latter is just defined to behave like the former for identifiers as last token (in the context of a user-defined literal operator name). The distinction is relevant e.g. because `""_blah` will not be expanded if `_blah` is an object-like macro and because identifiers may be reserved, but the _ud-suffix_ is not an identifier. – user17732522 Jul 02 '22 at 22:24

2 Answers2

2

It is allowed in C++ because it is allowed in C and requiring the space would be an unnecessary C-compatibility breaking change. Even setting that aside, it would be surprising to have int (x) and int(x) behave differently, since generally (with few minor exceptions) C++ is agnostic to additional white-space as long as tokens are properly separated. And ( (outside a string/character literal) is always a token on its own. It can't be part of a token starting with int(.

In C int(x) has no other potential meaning for which it could be confused, so there is no reason to require white-space separation at all. C also is generally agnostic to white-space, so it would be surprising there as well to have different behavior with and without it.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • What is confusing is that depending on context this is a variable declaration `int x;` or a cast `(int)x`. – Goswin von Brederlow Jul 02 '22 at 20:28
  • @GoswinvonBrederlow Yes, but that doesn't seem like enough of a reason to introduce an incompatibility with C and a special case for white-space relevance by enforcing that the variant without space may only be a functional notation explicit cast, not a declaration. In the end overloading `(`/`)` with a new meaning for C++ instead of using something less ambiguous might not have been the best choice, but it is too late to really do anything about it. – user17732522 Jul 02 '22 at 20:32
  • No there isn't. I find the given example very confusing. What is that code even doing? I think `*Oit(bar) = 7;` copy constructs a new iterator, dereferences it and assigns the value 7 (which prints 7), right? So really why would you ever write code like that? – Goswin von Brederlow Jul 02 '22 at 20:37
  • I would not write code like that, I'm more curious as to the allowed forms in C++ and reasons behind that – xdavidliu Jul 02 '22 at 21:20
1

One requirement when defining the syntax of a language is that elements of the language can be separated. According to the C++ syntax rules, a space separates things. But also according to the C++ syntax rules, parentheses also separate things.

When C++ is compiled, the first step is the parsing. And one of the first steps of the parsing is separating all the elements of the language. Often this step is called tokenizing or lexing. But this is just the technical background. The user does not have to know this. He or she only has to know that things in C++ must be clearly separted from each others, so that there is a sequence "*", "Oit", "(", "bar", ")", "=", "7", ";".

As explained, the rule that the parenthesis always separates is established on a very low level of the compiler. The compiler determines even before knowing what the purpose of the parenthesis is, that a parenthesis separates things. And therefore an extra space would be redundant.

When you ever use parser generators, you will see that most of them just ignore spaces. That means, when the lexer has produced the list of tokens, the spaces do not exist any more. See above in the list. There are no spaces any more. So you have no chance to specify something that explicitly requires a space.

habrewning
  • 735
  • 3
  • 12
  • I think it is correct that there is no whitespace dependence after tokenization in [translation phase 7](https://www.eel.is/c++draft/lex.phases#1.7), but in the preprocessing stage C++ has several exceptions where the presence or absence of whitespace between preprocessing tokens _does_ matter beyond just separation of the tokens, e.g. when defining/using function-like macros or when recognizing preprocessor directives. – user17732522 Jul 02 '22 at 22:32