Hang on, this will be long! I'll need to explain a few things before asking my questions.
According to the C++ standard (and as described in this question and its answers), a compiler should support Unicode (and even more precisely UTF-8 in source) in the names of identifiers (variables, functions, etc.) I know that Clang supports that fully (I mean you can use UTF-8 encoded source files) and GCC supports it only if you use \u
codes in the identifiers, but let's assume we live in a perfect world where this works properly on all compilers.
That is great! Now I no longer have to write my code in English and can finally do it in my native Bulgarian, or maybe Esperanto. That's the point of this requirement of the standard, after all. Except there is a still a huge problem with that. Let's see some (not really meaningfull) code:
First using identifiers in English (ASCII):
int i = 0;
while(i < 100)
{
auto f = static_cast<float>(i);
std::string currentName = "name_" + toString(f);
std::cout << getPrettyName(currentName) << ": " << getSalary(currentName) << std::endl;
}
And then using identifiers in Bulgarian (as it shows the problem very clearly):
int и = 0;
while(и < 100)
{
auto д = static_cast<float>(и);
std::string текущоИме = "име_" + превърниВНиз(д);
std::cout << красивоИме(текущоИме) << ": " << заплата(текущоИме) << std::endl;
}
As you can see, the second code is still mainly in English because of keywords and the standard library. There are two problems with that:
- It doesn't help non-English speaking Bulgarians understand the code (assuming they do not know C++ that well), they still have to know English to be proper programmers, and isn't that part of the point of this whole thing?
- What is actually worse, at least for me, is that this is very annoying to write. Those of you that speak a language, the alphabet of which is not based on the latin script, know that to write with a different alphabet, you have to switch the keyboard layout (most people use Alt+Shift). I had to switch the layout 4 times to write each line. This is very annoying, and slow.
This goes on for all languages, that are not based on the latin script: Chinese, Arabic, Russian, Hindi, …
The obvious solution (at least for me) is that the C++ language should support localised keywords (and standard library classes) in order for this whole Unicode-identifiers thing to have any sense. That has been done for ALGOL 68 and possibly others, and there are other more modern examples in the same article. That way the code in Bulgarian would look better and be much more easier to write (I don't claim that the Bulgarian words used must be exactly these):
цяло и = 0;
докато(и < 100)
{
авт д = статично_преобр<дробно>(и);
стд::низ текущоИме = "име_" + превърниВНиз(д);
стд::изх << красивоИме(текущоИме) << ": " << заплата(текущоИме) << стд::кред;
}
So, on to the questions:
- Is this actually allowed/possible according to the standard right now? I may be missing something…
- Is there any way to make a workaround in a decent way myself? Macros would work for the keywords but that would be awful.
using
would work about standard library classses (namespace стд { using низ = std::string; }
) but there is no way to deal with methods (std::string::size()
->размер()
?) apart from subclassing… or is there? - In case that is not possible or even considered, how should one go about suggesting this idea to the C++ gurus that make the standard?
Just to be clear, I don't mean that there should be different versions of C++ for the different languages — more like that it should be possible for it to support all at once via some setting or include
or whatever, if needed.