1

I am using a hidden RichTextBox to retrieve Text property from a RichEditCtrl. rtb->Text; returns the text portion of either English of national languages – just great!

But I need this text in \u12232? \u32232? instead of national characters and symbols. to work with my db and RichEditCtrl. Any idea how to get from “пассажирским поездом Невский” to “\u12415?\u12395?\u23554?\u20219?\u30456?\u35527?\u21729? (where each national character is represented as “\u23232?”

If you have, that would be great. I am using visual studio 2008 C++ combination of MFC and managed code.

Cheers and have a wonderful weekend

val
  • 151
  • 1
  • 10

2 Answers2

0

If you need a System::String as an output as well, then something like this would do it:

String^ s = rtb->Text;
StringBuilder^ sb = gcnew StringBuilder(s->Length);
for (int i = 0; i < s->Length; ++i) {
    sb->AppendFormat("\u{0:D5}?", (int)s[i]);
}
String^ result = s->ToString();

By the way, are you sure the format is as described? \u is a traditional Escape sequence for a hexadecimal Unicode codepoint, exactly 4 hex digits long, e.g. \u0F3A. It's also not normally followed by ?. If you actually want that, format specifier {0:X4} should do the trick.

Pavel Minaev
  • 99,783
  • 25
  • 219
  • 289
  • thanks Pavel. I will try your suggestion tonight. As for the format, those \u12395?\u23554? etc. are UNICODE hex as you correctly pointed out. I cut and pasted an example from my debug output - so the "?" really follows the \u with 5 or 3 digits depending on language. Whare are you from? – val Nov 27 '09 at 22:04
  • If they are hex, there shouldn't be 5 digits in them, as `\u` only permits four (and requires exactly four) in C++ and C#. What "debug output" did you cut? – Pavel Minaev Nov 27 '09 at 22:40
  • Pavel you are write about the 5 digits for asian languages like Chinese (\u20219?\u30456?\u35527?) or Russian (\u1086?\u1084?\u1099?) in my example above. As I get from Greek, it's seems only 3 digits available as here: "\u957? \u946?\u959?" All of my examples are snippets only not full sentences. I am having the very first project in internaltional languages and having lots of fun;-) I am just back from University after having a long day and a few beers. I'll try the stuff tomorrow on a fresh head. Privet – val Nov 28 '09 at 02:43
0

You don't need to use escaping to put formatted Unicode in a RichText control. You can use UTF-8. See my answer here: Unicode RTF text in RichEdit.

I'm not sure what your restrictions are on your database, but maybe you can use UTF-8 there too.

Community
  • 1
  • 1
asveikau
  • 39,039
  • 2
  • 53
  • 68
  • Wow, great answer article! If I got it right, your PSTR Utf8; will look like "\u12395?\u23554?\u20219?" when PWSTR WideString = "Сотрудники главного управления МЧС ". Am I correct? – val Nov 27 '09 at 22:10
  • The way UTF-8 works, each WCHAR value that is > 128 will be represented as anywhere from 2 to 4 CHAR values... For example L"д" will be "\xd0\xb4". You can read more about how it works at the Wikipedia article for UTF-8. – asveikau Nov 28 '09 at 00:08
  • Thanks again buddy, you've helped a lot. Cheers, I need a break after a party with my friends at universcity tonight. I'll try to fix my code tomorrow ;-) – val Nov 28 '09 at 02:46