1

Is using non-Latin characters in #error directive allowed by C++ Standard?

E.g. I would like to write an error message in Russian:

#error Сообщение об ошибке
int main() { }
αλεχολυτ
  • 4,792
  • 1
  • 35
  • 71
  • Possible duplicate of [Using Unicode in C++ source code](https://stackoverflow.com/questions/331690/using-unicode-in-c-source-code) – Federico klez Culloca Mar 04 '19 at 10:31
  • 1
    It seems like standard have nothing about this. It should be up to compiler i.e. [MS VC++ /source-charset:IANA_name|.CPI](https://learn.microsoft.com/en-us/cpp/build/reference/source-charset-set-source-character-set?view=vs-2017) or [GCC -fexec-charset=charset](https://gcc.gnu.org/onlinedocs/cpp/Invocation.html). In any case - this is generally a bad practice, since Russian/Greece compile time error messages can not be read by anyone who work with the code. – Victor Gubin Mar 04 '19 at 10:35
  • 1
    *"since Russian/Greece compile time error messages can not be read by anyone who work with the code"* @VictorGubin you may say that for English too. – Federico klez Culloca Mar 04 '19 at 10:37
  • @FedericoklezCulloca since the C++ grammer and the standard library and almost all of the libraries use english as the language I would say no, you cannot say that about english too. English is kind of mandatory when programming. – bolov Mar 04 '19 at 10:41
  • The implementation defines what characters are in the source character set. The standard doesn't mandate anything beyond the basic ASCII-ish set. So the standard allows it to work or to fail, at the discretion of each implementation. – n. m. could be an AI Mar 04 '19 at 10:42
  • @ Federico klez Culloca forse hai ragione :) – Victor Gubin Mar 04 '19 at 10:43
  • When you tried it, what did it do? – Eljay Mar 04 '19 at 12:53
  • 2
    @Eljay In C, "it worked when I tried it" is NOT a reliable guide to whether something can be safely done. – zwol Mar 04 '19 at 15:35

1 Answers1

2

Whether or not you can put non-ASCII characters in an #error directive's argument is "locale-specific" according to C2011 5.2.1p1. The tokens on the line after #error contain characters that are not part of the basic source character set; whether or not they are valid as part of the extended source character set is locale-specific. Per annex J.4, locale-specific behavior is required to be documented, just like implementation-defined behavior.

The difference between locale-specific and implementation-defined behavior is that there might be several locales, each with its own set of extended source characters. Perhaps only some of those extended source character sets include Cyrillic. These aspects of the C standard were last revised in 1999, before Unicode took over the world, so they're worrying about scenarios such as feeding a source file encoded in ISO 8859-5 to a compiler that expects extended source characters to conform to EUC-JP.

Regardless of how you actually encode your source files and whether that matches what the compiler expects, what you're trying to do is more likely to work if you use a string literal as the argument of #error:

#error "Сообщение об ошибке"

This is because some compilers allow a broader variety of characters in string literals than they do in identifiers.

zwol
  • 135,547
  • 38
  • 252
  • 361