38

Consider the function call (calling int sum(int, int))

printf("%d", sum(a,b));

How does the compiler decide that the , used in the function call sum(int, int) is not a comma operator?

NOTE: I didn't want to actually use the comma operator in the function call. I just wanted to know how the compiler knows that it is not a comma operator.

user2018675
  • 657
  • 2
  • 5
  • 15
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    you are talking about which one of the two commas... – Sazzadur Rahaman Jun 29 '13 at 18:39
  • @SazzadurRahaman; comma in the function call. – haccks Jun 29 '13 at 18:44
  • 4
    Why people are voting to close this!!!!!!!! – haccks Jun 29 '13 at 19:14
  • 22
    Disagree on this question being off-topic. The question asks a subtle detail about how a certain syntax can be interpreted by implementations and it can be conclusively answered by citing the relevant standardese quotes. *Efforts on trying to solve the problem* doesn't apply here. Understanding or hunting down standardese quotes is not really a trivial task. – Alok Save Jun 29 '13 at 19:38
  • 1
    There are two function calls, one to `sum` and one to `printf`. – Keith Thompson Jun 29 '13 at 19:55
  • https://en.wikipedia.org/wiki/Occurs_check the compiler converts the C code to symbols and then performs an occurs check – David K Jun 29 '13 at 20:51
  • 3
    I once had some C code behave strange because I was doing a division by an integer via a pointer. ie, the expression was `a/*b`. It was fixed by adding some whitespace: `a / *b` – Stewart Jun 29 '13 at 21:10
  • @Stewart: I do not understand what do you want to say? – haccks Jul 02 '13 at 11:35
  • Just that a compiler, like any software, is simply a machine following rules. In my example, the rule that `/*` means 'start comment' is senior to `/` means 'division', `*` means dereference pointer. With your example it'll be something equally simple, such as `,` inside `()` means 'argument separator'. That's all. – Stewart Jul 03 '13 at 06:50
  • 1
    «@SazzadurRahaman; comma in the function call»: but they're both function calls! `:P` – JMCF125 Feb 10 '14 at 22:34

6 Answers6

49

Look at the grammar for the C language. It's listed, in full, in Appendix A of the standard. The way it works is that you can step through each token in a C program and match them up with the next item in the grammar. At each step you have only a limited number of options, so the interpretation of any given character will depend on the context in which it appears. Inside each rule in the grammar, each line gives a valid alternative for the program to match.

Specifically, if you look for parameter-list, you will see that it contains an explicit comma. Therefore, whenever the compiler's C parser is in "parameter-list" mode, commas that it finds will be understood as parameter separators, not as comma operators. The same is true for brackets (that can also occur in expressions).

This works because the parameter-list rule is careful to use assignment-expression rules, rather than just the plain expression rule. An expression can contain commas, whereas an assignment-expression cannot. If this were not the case the grammar would be ambiguous, and the compiler would not know what to do when it encountered a comma inside a parameter list.

However, an opening bracket, for example, that is not part of a function definition/call, or an if, while, or for statement, will be interpreted as part of an expression (because there's no other option, but only if the start of an expression is a valid choice at that point), and then, inside the brackets, the expression syntax rules will apply, and that allows comma operators.

haccks
  • 104,019
  • 25
  • 176
  • 264
ams
  • 24,923
  • 4
  • 54
  • 75
  • 4
    I had forgotten that there is a technical term with that name. I merely mean that any given token can only be understood in the context in which it appears. In other words, I'm using "context sensitive" as and adjective rather than a noun. However, I suspect that the only people confused by this were people who already knew the answer! – ams Jul 02 '13 at 08:35
  • This is a good answer but you should also mention that the *things between the commas* are `assignment-expression` nonterminals rather than `expression` nonterminals (as discussed in Jens' answer), thus disallowing `,` at top level of a `parameter-list` from being the comma operator. If the standard did what you describe without also doing this, the overall grammar would be ambiguous. – zwol Feb 10 '14 at 15:47
  • @Zack, quite so. I've expanded the answer with that info. – ams Feb 11 '14 at 17:06
  • @EricLippert: I don't think it makes sense to say C has a context-free grammar. If you go that direction, then you could also claim C++ has a CFG (since, just like in C's case, it's ambiguous and requires a semantic pass to reject invalid programs). If you want to be really strict then you could also claim most programming languages do *not* have CFGs because they all require declarations before definitions before the program is deemed valid, which isn't context-free. Neither is a very useful definition since it puts most languages in the same category. (cont'd) – user541686 Jun 15 '14 at 22:19
  • @EricLippert: (cont'd) ... from a practical standpoint (maybe not so much on the theory side) I feel a useful definition would be that C is context-free iff it has a CFG that unambiguously parses all valid C programs *assuming there are no undeclared identifiers*. But in that case, C is not context-free (and thus has no CFG) because of the classic `T * T;` ambiguity, which requires knowing what `T` *is* (and not merely whether it's declared). Hence I don't think it makes sense to say C is context-free. – user541686 Jun 15 '14 at 22:26
  • @Mehrdad: I take your point, but you are using "grammar" in a broader sense than it is usually construed. The famous sentence "Colourless green ideas sleep furiously." is *grammatical* in English but it is *nonsensical*, and the sentence "Bob Smith is the king of England." is *grammatical* but *false*. The *grammar of the C language* does not intend to be one-stop-shopping for determining what is a *legal* C program any more than the grammar of English determines what is a true statement. – Eric Lippert Jun 16 '14 at 13:44
  • @EricLippert: Thanks for the response. I'm not sure I understand what you mean though. If a program is syntactically valid C, then the grammar must parse it correctly -- otherwise it's not the grammar of C. Ditto with the converse. If you define the grammar to be something that accepts syntactically invalid C programs as well, then how do you define context-free-ness? It's *always* possible to make an overly-broad CFG for a CSL (just accept every "tricky" string and leave the rest to semantic analysis...). What's your definition and what *would* it classify as context-sensitive, if not C? Why? – user541686 Jun 16 '14 at 19:27
26

From C99 6.5.17:

As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call

f(a, (t=3, t+2), c)

the function has three arguments, the second of which has the value 5.

Another similar example is the initializer list of arrays or structs:

int array[5] = {1, 2};
struct Foo bar = {1, 2};

If a comma operator were to be used as the function parameter, use it like this:

sum((a,b))

This won't compile, of course.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • 5
    True but not an answer to the question. – bmargulies Jun 29 '13 at 18:38
  • @Yu : I did't want to use comma operator. I just want to know how compiler know that it is not a comma operator! – haccks Jun 29 '13 at 18:51
  • @sasha.sochka See the OP's comment. He wants to know how parsers work, not how to use a comma in a function call. – bmargulies Jun 29 '13 at 18:56
  • 1
    @haccks Got it, edited my words. Using a comma operator as the function parameter is not practically useful, but knowing how to use it is still interesting, so I'll keep this part though. – Yu Hao Jun 29 '13 at 19:01
  • @YuHao; Thanks dude! at least. And also thanks for edit to my post. – haccks Jun 29 '13 at 19:07
19

The reason is the C Grammar. While everyone else seems to like to cite the example, the real deal is the phrase structure grammar for function calls in the Standard (C99). Yes, a function call consists of the () operator applied to a postfix expression (like for example an identifier):

 6.5.2 postfix-expression:
       ...
       postfix-expression ( argument-expression-list_opt )

together with

argument-expression-list:
       assignment-expression
       argument-expression-list , assignment-expression    <-- arglist comma

expression:
       assignment-expression
       expression , assignment-expression                  <-- comma operator

The comma operator can only occur in an expression, i.e. further down the in the grammar. So the compiler treats a comma in a function argument list as the one separating assignment-expressions, not as one separating expressions.

Jens
  • 69,818
  • 15
  • 125
  • 179
  • 1
    @haccks: a conditional-expression or a unary-expression followed by an assignment-operator followed by an assignment-expression. – Jens Jun 29 '13 at 19:28
  • 1
    I did't get your point please elaborate.It should be appreciated – haccks Jun 29 '13 at 20:31
  • 4
    To expand a bit on @Jens answer: let's change the problem and simplify it. Instead of "expressions" let's have golf balls (painted yellow) and also big clear plastic balls that can be opened up and have stuff stuck inside them: `(` stuff `)`. The grammar says, in effect, that you may have yellow golf balls, which are automatically separated. Or, you may provide a clear ball *as long you've used both halves*. The clear ball works as a unit, it can't be opened up and separated. So: f( (a,b), g ) has one "clear ball" (a,b) and one "yellow ball" g and hence exactly two balls, er, arguments. – torek Jun 30 '13 at 01:54
  • 2
    I ran out of comment room, so, continued, and back to the real C grammar: the parentheses allow you to escape out to a "full blown" expression, where commas are comma expression parts. Until you have an "extra" open parenthesis, though, you're in this more limited "assignment-expression" sub-grammar (like the "yellow golf balls" idea), where commas are simply not allowed. If the parser comes across a comma in this context, it has to stop and finish the assignment-expression. This works because `(` "finishes off" with `)`: the bracketing ends the full expression context. – torek Jun 30 '13 at 02:00
  • @torek; I did't get your line: ` *as long you've used both halves* `(sorry for my bad English). – haccks Jun 30 '13 at 17:19
  • @torek and also line:` *the parentheses allow you to escape out to a "full blown" expression, where commas are comma expression parts* ` – haccks Jun 30 '13 at 17:43
  • 2
    Hm, I don't have any other natural language to express this. Consider `{` … `}`, `[` … `]`, and `(` … `)`. They "match up": if you write `a[3}` it's obviously wrong. If you write `a[(3]` it's still obviously wrong. `(` is ended only by the matching `)`. That "closes off" the whole sequence, making it clear what goes with what. – torek Jun 30 '13 at 23:59
11

Existing answers say "because the C language spec says it's a list separator, and not an operator".

However, your question is asking "how does the compiler know...", and that's altogether different: It's really no different from how the compiler knows that the comma in printf("Hello, world\n"); isn't a comma operator: The compiler 'knows' because of the context where the comma appears - basically, what's gone before.

The C 'language' can be described in Backus-Naur Form (BNF) - essentially, a set of rules that the compiler's parser uses to scan your input file. The BNF for C will distinguish between these different possible occurences of commas in the language.

There are lots of good resources on how compilers work, and how to write one.

Community
  • 1
  • 1
Roddy
  • 66,617
  • 42
  • 165
  • 277
6

The draft C99 standard says:

As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call f(a, (t=3, t+2), c) the function has three arguments, the second of which has the value 5.

In other words, "because".

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 6
    my kids don't take that for an answer why should the OP... but that is the reason, because the ambiguous case is prohibited. – Grady Player Jun 29 '13 at 20:28
1

There are multiple facets to this question. One par is that the definition says so. Well, how does the compiler know what context this comma is in? That's the parser's job. For C in particular, the language can be parsed by an LR(1) parser (http://en.wikipedia.org/wiki/Canonical_LR_parser).

The way this works is that the parser generates a bunch of tables that make up the possible states of the parser. Only a certain set of symbols are valid in certain states, and the symbols may have different meaning in different states. The parser knows that it is parsing a function because of the preceding symbols. Thus, it knows the possible states do not include the comma operator.

I am being very general here, but you can read all about the details in the Wiki.

John Tseng
  • 6,262
  • 2
  • 27
  • 35