Türkish chars 'ÇçĞğİıÖöŞşÜü' are not handled correctly in utf-8 encoding altough they all seem to be defined. Charcodes of all of them is 65533 (replacemnt character, possibly for error display) in usage and a question mark or box is displayed depending on the selected font. In some cases 0/null is returned as charcode. On the internet, there are lots of tools which give utf-8 definitions of them but I am not sure if tools use any defined (real/international) registry or dynamicly create the definition with known rules and calculations. Fonts for them are well-defined and no problem to display them when we enter code points manually. This proves that they are defined in utf-8. But on the other hand they are not handled in encodings or tranaformations such as ajax requests/responses.
So the base question is "HOW CAN WE DEFINE A CODEPOINT FOR A CHAR"? The question may be tailored as follows to prevent mis-conception. Suppose we have prepared the encoding data for "Ç" like this -> Character : Ç Character name : LATIN CAPITAL LETTER C WITH CEDILLA Hex code point : 00C7 Decimal code point : 199 Hex UTF-8 bytes : C387 ...... Where/How can we save this info to be a standard utf-8 char? How can we distribute/expose it (make ready to be used by others) ? Do we need any confirmation by anybody/foundation (like unicode/utf-8 consortium) How can we detect/fixup errors if they are already registered but not working correctly? Can we have custom-utf8 configuration? If yes how?
Note : No code snippet is needed here as it is not mis-usage problem.
������������ and ABCDEF when we leave out charset declaration we get -> ÇçĞğİıÖöŞşÜü and ABCDEF
– İlhan ÇELİK Feb 05 '13 at 13:59