1

struct birthday{ int day:6; }b-day;

While declaring b-day as a structure it shows the following error:

error: expected ':', ',', ';', '}' or '__attribute__' before '-' token|

but after removing the hyphen from the variable name it works, why?

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
  • 2
    Because you just can't put a hyphen in an identifier name. The syntax forbids it. And thank god it does, I can't imagine the amount of trouble allowing hyphens in names would cause. – mediocrevegetable1 Aug 12 '21 at 11:27
  • `b-day` is the same as `b - day` which consists of 3 tokens. Some languages do allow `-` in identifiers but C is not one of them. It's a little bit unfortunate because `-` makes it much easier to read than `_`, and it forces the programmer to use proper spacing around operators – phuclv Aug 12 '21 at 11:29
  • The hyphen is not a valid identifier character. It has other uses in the C language. Identifiers in C may contain letters, digits, and underscores, and they must begin with a non-digit. – Tom Karzes Aug 12 '21 at 11:29
  • Why can't you name your child "Is"? Because it would be too confusing. "Is Is a boy or a girl?" – Steve Summit Aug 12 '21 at 14:54

4 Answers4

5

Hyphens are used as subtraction and negation operators, so they cannot be used in variable names. (Whether the variable is for a structure or another type is irrelevant.)

If you had:

int a = 1;
int b = 2;
int a-b = 3;
printf("%d\n", a-b);

then we would have ambiguity about whether to print “-1” for a minus b or to print “3” for the variable a-b.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

Because C doesn't allow to use hyphens for identifier names.

Basically you can only use alphabets, digits, and an underscore. Also using digits as the first character is not allowed.

Quote from N1570 6.4.2 Identifiers:

Syntax

identifier:
    identifier-nondigit
    identifier identifier-nondigit
    identifier digit

identifier-nondigit:
    nondigit
    universal-character-name
    other implementation-defined characters

nondigit: one of
    _ a b c d e f g h i j k l m
      n o p q r s t u v w x y z
      A B C D E F G H I J K L M
      N O P Q R S T U V W X Y Z

digit: one of
    0 1 2 3 4 5 6 7 8 9
MikeCAT
  • 73,922
  • 11
  • 45
  • 70
  • I think it's UB to start with underscore – klutt Aug 12 '21 at 11:30
  • 2
    @klutt according to [Can an underscore be used anywhere in an identifier in C?](https://stackoverflow.com/a/19972251), if you start your identifier with one underscore then the following character can't be an uppercase letter, and if it's a lowercase letter, it should only have file scope. If you abide by that rule, I think it should be fine. – mediocrevegetable1 Aug 12 '21 at 11:33
1

The boring answer is that the language definition doesn't allow - to be part of an identifier (variable name, function name, typedef name, enumeration constant, tag name, etc.).

Why that's the case probably boils down to a couple of things:

At the preprocessing stage, your source text is broken up into a sequence of tokens - identifiers, punctuators, string literals, and numeric constants. Whitespace is not significant except that it separates tokens of the same type. If you write a=b+c;, the compiler sees the sequence of tokens identifier (a), punctuator (=), identifier (b), punctuator (+), identifier (c), and punctuator (;). This is before it does any syntax analysis - it's not looking at the meaning or the structure of that statement, it's just breaking it down into its component parts.

It can do this because the characters = and + and ; can never be part of an identifier, so it can clearly see where identifiers begin and end1.

The tokenizer is "greedy" and will build the longest valid token it can. In a declaration like

int a;

you need the whitespace to tell the preprocessor that int and a are separate tokens, otherwise it will try to mash them together into a single token inta. Similarly, in a statement like a=b- -c;, you need that whitespace (or parentheses, a=b-(-c);) to signify you're subtracting -c from b, otherwise the tokenizer will interpret it as a = b-- c, which isn't what you want.

So, if a - could be part of an identifier, how should x=a-b+c be tokenized? Is a-b a single token or three? How would you write your tokenizer such that it could keep track of that? Would you require whitespace before and after - to signify that it's an operator and not part of a variable?

It's certainly possible to define a language that allows - to be both an operator and part of an identifier (see COBOL), but it adds complexity to the tokenizing stage of compiling, and it's just plain easier to not allow it.


  1. Coincidentally, this is why there's no difference between T *p; and T* p; when declaring pointer variables - the * can never be part of an identifier, so whitespace isn't necessary to separate the type from the variable name. You could write it as T*p; or even T * p; and it will be treated exactly the same.
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • the simplest way to allow `-` in identifiers is to just require spaces around binary operators. That's not only good for readability but also follow normal spacing rules in writing – phuclv Aug 13 '21 at 01:39
0

It is because the symbol '-' may not be used in an identifier name. When it is used between a sequence of symbols it is considered by the compiler as the binary or unary minus operator depending on the context.

The error message

error: expected ':', ',', ';', '}' or 'attribute' before '-' token|

means that the compiler tries to interpret the declaration at least like

    struct birthday{
        int day:6;
    }b; -day;

You could declare the structure like

    struct birthday{
        int day:6;
    } b_day;

that is using the underscore symbol instead of the hyphen symbol.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335