2

I came to this obscure thing ... I would like to know if there are any possibilities for the @ sign to appear in the source of a valid C/C++ application, beside of the following situations:

  • a const char* value such as const char* addr = "xyz@gmail.com"
  • a const char value, such as char c = '@'
  • a macro which is never used: #define NEVER_EVER ABC@
  • in a commented out section

Reason for asking: curiosity :)

Ferenc Deak
  • 34,348
  • 17
  • 99
  • 167
  • 1
    `in a commented out section` pretty much anything can be there. :-) – Sourav Ghosh Feb 23 '15 at 11:36
  • There should possibly be separate questions for C and C++, the syntaxes are quite different. I can say that for C, there is no situation where an @ would be valid syntax and it is not an allowed char in tokens either. But I dont know about c++ – Vality Feb 23 '15 at 11:36
  • 1
    See [this question](http://stackoverflow.com/questions/24114365/does-at-symbol-and-dollar-sign-has-any-special-meaning-in-c-or-c). – Daniel Kleinstein Feb 23 '15 at 11:36
  • I just tried `gcc -fextended-identifiers` and it still told me that `@` was a stray.... – Iharob Al Asimi Feb 23 '15 at 11:37
  • No unless it is some sort of extension (like that Objective-C that technically is extension to C or C++). – Öö Tiib Feb 23 '15 at 11:39
  • I think you've pretty much covered al the cases. The first two cases are trivially true. Just about any visible character is valid between double quotes (string) or as a single character between single quotes, as long as escaping rules are followed as needed. And as @SouravGhosh said, of course you can put anything in a comment. You can even put in control characters (a favorite way to do form feeds in a printed listing, back when folks did that, was to have a comment `/* ^L */`). – lurker Feb 23 '15 at 11:44
  • Why is this tagged security-by-obscurity? – Stefano Sanfilippo Feb 23 '15 at 11:46
  • @StefanoSanfilippo An educated guess would be that OP wants to use `@` to obfuscate his code. – Daniel Kleinstein Feb 23 '15 at 11:47
  • @DanielKleinstein you're getting close :) Indeed, I'm researching various obscure ways to obfuscate my code – Ferenc Deak Feb 23 '15 at 11:49
  • Perhaps interesting: I thought another exception might be the delimiter in C++11's raw string literals, but it's not allowed even there. An obvious addition to your rules is a macro which *is* used, but where the expansion is stringized. –  Feb 23 '15 at 11:57

2 Answers2

3

I would answer for the C language. Note that there isn't any such thing as C/C++. Both are separate languages and C is not a subset of C++.

Beside those possibilities, that you described, @ can be also placed in header names, but it's not a common practice:

main.c:

#include <stdio.h>

#include "fancy@header.h"

int main(void)
{
    foo();

    return 0;
}

fancy@header.h:

static void foo(void)
{
    printf("whatever\n");
}

For a Standard reference to cover this, you might look into C11 §5.2.1/p3 that covers the basic execution character set, which does not include the @ character. This paragraph also provides a list of cases that may allow a @ character (emphasis mine):

In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line. If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined.

In case of identifiers, see C11 §6.4.2.1/p3:

Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in D.1.71) The initial character shall not be a universal character name designating a character whose encoding falls into one of the ranges specified in D.2. An implementation may allow multibyte characters that are not part of the basic source character set to appear in identifiers; which characters and their correspondence to universal character names is implementation-defined.

The D.1 (normative) appendix section lists ranges of allowed characters. As you might check the @ character can be represented as U+0040 in UCS, that is outside of allowed range:

00A8, 00AA, 00AD, 00AF, 00B2−00B5, 00B7−00BA, 00BC−00BE, 00C0−00D6, 00D8−00F6, 00F8−00FF (...)

Even with that, compiler might allow @ character as language extension. C11 J.5.2/p1 Specialized identifiers (Common extensions) contains:

Characters other than the underscore _, letters, and digits, that are not part of the basic source character set (such as the dollar sign $, or characters in national character sets) may appear in an identifier (6.4.2).

For instance GCC allows $ sign as GNU extension in that way:

In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
1

There isn't any problem with all the above.

The @ is invalid in names (variables, functions, classes, etc.) Some linkers actually use the @ character as "at" meaning to relate symbols to libraries. (try to nm some of your executables in Linux); you'll see something like this: malloc@@GLIBC_2.2.5 means malloc taken from GLIBC_2.2.5.

In strings and characters, the only problematic seen character is the \ which is also used as an escape character and the " in strings and ' in characters which must be escaped to not be translated as end of string/character.

In comments, there aren't any limitations except the */ in multi-line comment which will close the comment.

A never-used macro does not really exist after precompilation, so there isn't any problem at all.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SHR
  • 7,940
  • 9
  • 38
  • 57
  • 3
    '@' is **not** part of the required *basic source character set*, so it could cause problems in a conforming implementation. (N4296 2.3 [lex.charset] p1) – BoBTFish Feb 23 '15 at 11:44
  • 1
    The other two "aliens" out of the set of printable ASCII-7 characters that are not part of the *basic source character set* are the dollar sign `$` and the backtick. – DevSolar Feb 23 '15 at 11:59