Prevent C# Rich Text Box from converting Asian characters to Hexadecimal

Question

I have a simple C# Windows Form app that converts rich text (pasted from PDF or DOC) into RTF markup. However whenever I paste in Asian characters like this: ありがとうございました the RTF markup converts the characters to their hexadecimal equivalents like this: \'82\'a0\'82\'e8\'82\'aa\'82\'c6\'82\'a4\'82\'b2\'82\'b4\'82\'a2\'82\'dc\'82\'b5\'82\'bd

Anyone know if it's possible to prevent this and have the RTF markup retain the actual Asian characters? I know it's theoretically possible because if I paste the actual characters into the RTF markup window they do not get converted.

Per request here's the actual code that pushes the RTF text into a plain text field ... which is the point where the Asian characters get converted:

private void rtfBox_TextChanged(object sender, EventArgs e){
    plainTextBox.Text = rtfBox.Rtf.ToString();
}

Source of the app here if anyone wants to see further: https://github.com/cemerson/RTFMarkupHelper

Related/Possible duplicates:

Check out this maybe? http://stackoverflow.com/questions/3446649/richtextbox-rtf-return-unicode-format-or-ansi-format — David Oesterreich, Apr 08 '16 at 20:47
`Build of the app here if anyone wants to see it in action` post your minimal, compilable code to reproduce the problem here. — Eser, Apr 08 '16 at 20:54

Kateract · Answer 1 · 2016-04-08T21:52:13.937

-1

It is preserving the Asian characters. The hexidecimal encodes unicode values that aren't represented directly in the ANSI encoding used by the RTF spec. Add the following code under a "Save RTF" Button and you'll be able to save the file as a .rtf file, and then open it in wordpad/etc.

    private void button6_Click(object sender, EventArgs e)
    {
        SaveFileDialog sfd = new SaveFileDialog();
        sfd.Filter = "RTF File|*.rtf";
        sfd.DefaultExt = ".rtf";
        if (sfd.ShowDialog() == DialogResult.OK)
        {
            FileInfo fi = new FileInfo(sfd.FileName);
            StreamWriter sw = new StreamWriter(fi.OpenWrite());
            sw.Write(textBox.Text);
            sw.Close();
        }
    }

edited Apr 08 '16 at 21:52

answered Apr 08 '16 at 21:04

Kateract

822
6
15

I appreciate your suggestion here but in my situation I need to prevent the characters from being converted in the first place. Maybe it's not possible? – Christopher Apr 11 '16 at 09:40
@ChrisEmerson It's not possible, RTF files are encoded in ASCII, which doesn't support the full unicode of UTF-8 except through escape sequences. The [Character Encoding](https://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding) section on the Wikipedia article explains this fairly well. – Kateract Apr 11 '16 at 15:36
It certainly looks grim but I don't want to give up yet. The thing that bugs me is that I can manually paste the Asian character into the plain text box myself as well as in the rtf text box so I know those characters can "live" in either place. The problem seems to be the conversion process. My other challenge is figuring out the difference between character strings like this: "\u20302" and this: "\p20\a23\e44". Anyhow - you very well may be right but I want to hold out hope bit longer and tinker/research bit more. – Christopher Apr 12 '16 at 16:13
@ChrisEmerson The standard text box supports unicode characters, it's the RTF spec that doesn't. The property of the Rich text box that you are using to fill the regular text box gets the actual RTF markup. – Kateract Apr 13 '16 at 14:38

Prevent C# Rich Text Box from converting Asian characters to Hexadecimal

1 Answers1