0

I have a method that converts strings to RTF-Strings.

For that i use the RichTextBox wich is provided by .NET the way it is described here:
How to convert a string to RTF in C#?

When I enter ő it returns {\rtf1 {'f5\f1}}. But that seems to be õ because I get that symbol, when I put it into a .rtf-file.

Why does that happen? And what can i do to solve this issue?

EDIT:

Here is the whole Method as i use it:

private static string ConvertToRtf(string text) {
        System.Windows.Forms.RichTextBox richTextBox = new System.Windows.Forms.RichTextBox();
        richTextBox.Text = text;
        int offset = richTextBox.Rtf.IndexOf(@"\f0\fs17") + 8;
        int length = richTextBox.Rtf.LastIndexOf(@"\par") - offset;
        string result = richTextBox.Rtf.Substring(offset, length).Substring(1);
        return result;
    }
Community
  • 1
  • 1
Torben L.
  • 73
  • 1
  • 10
  • Which encoding do you use? And what encoding is the RTF string in? – jgauffin Oct 31 '12 at 11:40
  • I set no encoding. And i though it would be huge advantage of RTF, that it has no different encodings? btw, i attached my code to my question. – Torben L. Oct 31 '12 at 11:47
  • RTF is not a Unicode encoding, it predates that standard and uses a crazy charset scheme. I can't repro your problem but we don't know anything about the default code page on your machine. Post the entire RTF string so we can see the code page. – Hans Passant Oct 31 '12 at 12:51
  • @HansPassant The whole RTF string of the rtfBox looks like this: {\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1031{\\fonttbl{\\f0\\fnil\\fcharset238 Microsoft Sans Serif;}{\\f1\\fnil\\fcharset0 Microsoft Sans Serif;}}\r\n\\viewkind4\\uc1\\pard\\f0\\fs17\\'f5\\f1\\par\r\n}\r\n – Torben L. Oct 31 '12 at 15:08

1 Answers1

0

The whole RTF string of the rtfBox looks like this (etc..)

That's fine and displays correctly. Your code snippet however doesn't make sense. You cannot just take a sliver of RTF and hope it displays properly. Particularly the \f0 is important, that selects the charset. In this case character set 238, the charset for Eastern European languages. Note how the RTF contains the \fonttbl command to assign f0.

So if you copy that sliver of RTF and use it elsewhere, like in some other RTB that was not initialized with the same \fonttbl commands then you'll get a character from the wrong charset. Like charset 0, which indeed displays õ instead.

Well, now you know why Unicode was invented ;)

The workaround is to only copy text from the RichTextBox.Text property. That's Unicode.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536