Does Scintilla really support Unicode? If so, why does SCI_GETCHARAT
return a char
value (casted to LRESULT
)?

- 205,094
- 128
- 528
- 886
2 Answers
From the SCI_SETCODEPAGE docs...
Code page SC_CP_UTF8 (65001) sets Scintilla into Unicode mode with the document treated as a sequence of characters expressed in UTF-8. The text is converted to the platform's normal Unicode encoding before being drawn by the OS and thus can display Hebrew, Arabic, Cyrillic, and Han characters.
You will have to examine the byte you retrieve with SCI_GETCHARAT(pos) and, depending on the top bits of that, maybe read SCI_GETCHARAT(pos+1) and beyond in order to get the Unicode code point. (See here.)
Edit:
For some C++ code that does this, see below (search for SciMoz::GetWCharAt
):
http://vacuproj.googlecode.com/svn/trunk/npscimoz/npscimoz/oldsrc/trunk.nsSciMoz.cxx

- 12,682
- 2
- 39
- 53
-
+1 huh, interesting... doesn't that mean you can't randomly access a character, though? How does the editor work with large files then? – user541686 Jun 09 '11 at 02:31
-
It probably does complicate that, although usually one would be working with caret or selection positions which would already have compensated for multibyte characters, presumably. As for your second question, I'm not sure I see a problem, but the [Scintilla source code](http://www.scintilla.org/SciTEDownload.html) would surely be enlightening :-) – Martin Stone Jun 09 '11 at 07:17
I was long time ago but if I remember well Scintilla is not a native Unicode application. Still it has some Unicode support.
First, the function name should SCI_GETBYTEAT
, because it returns a byte from UTF-8 internal buffer.
Also, the application has Unicode support for keybaord, so it has some Unicode support :)

- 161,544
- 178
- 535
- 806
-
Oh wow, I just saw your edit. +1 okay... but how does it open large files without random indexing? – user541686 Jun 09 '11 at 02:32