In C++, we can use a wide variety of Unicode characters in identifiers. For example, you could name a variable résumé
.
Those accented e
s can be represented in different ways: either as a precomposed character or as a plain e
with a combining accent character. Many applications normalize such strings so that seemingly identical strings actually match.
Looking at the C++ standard, I don't see anything that requires the compiler to normalize identifiers, so variable résumé
could be distinct from variable résumé
. (In my tests, it doesn't seem as though MSVC nor clang normalize the identifiers.)
Is there anything that prohibits the compiler from choosing a normal form? If not, at what phase of translation should normalization occur?
[To be clear: I'm talking about identifiers, not string literals.]