1

I am trying to apply heading styles as there are in MS Word by extracting the rtf strings of their heading styles. Actually, rtf string works well for the English text and applies formatting to it but when its applied to Urdu Text, it gives formatted "????".

Let me explain you guys from example:

I select the word written in Urdu as "اللغة العربية" and i have already an rtf string containing the rtf of heading style of MS Word as: {\rtf1\ansi\ansicpg1252... "اللغة العربية"...} in which i am adding this string so to get a formatted string.

But instead of giving me the formatted اللغة العربية, it gives formatted question marks "????" which i think is an encoding or font problem. So kindly tell me as how to apply rtf string to Urdu to get formatted text?

crthompson
  • 15,653
  • 6
  • 58
  • 80
Shehdi
  • 19
  • 4
  • 1
    Please read [Know About Unicode and Character Sets](http://www.joelonsoftware.com/articles/Unicode.html) by Joel. Than add code to your post. – Alexei Levenkov Oct 22 '13 at 06:43
  • `\ansi\ansicpg1252` gives a hint that you probably cannot put Unicode in there. But I don't know RTF all that well. – Joey Oct 22 '13 at 06:47
  • Alexie i will do it as you have said but right now tell me as what Joey pointed as ansi encoding should be some what unicode. So – Shehdi Oct 22 '13 at 07:41
  • Run Wordpad.exe, paste that text, save the file. You'll have the RTF you need. – Hans Passant Oct 22 '13 at 10:27

1 Answers1

0

You need to use a function to convert unicode characters in the string to their corresponding rtf codes:

static string GetRtfUnicodeEscapedString(string s)
{
    var sb = new StringBuilder();
    foreach (var c in s)
    {
        if(c == '\\' || c == '{' || c == '}')
            sb.Append(@"\" + c);
        else if (c <= 0x7f)
            sb.Append(c);
        else
            sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    }
    return sb.ToString();
}

found here: https://stackoverflow.com/a/9988686/1543816

Characters whose integer value is more than 127 (7f hex) will be converted to \uxxxx? where xxxx is the unicode of the character.

Community
  • 1
  • 1
Jerry
  • 4,258
  • 3
  • 31
  • 58