16

Question

How do I convert the string "Européen" to the RTF-formatted string "Europ\'e9en"?

[TestMethod]
public void Convert_A_Word_To_Rtf()
{
    // Arrange
    string word = "Européen";
    string expected = "Europ\'e9en";
    string actual = string.Empty;

    // Act
    // actual = ... // How?

    // Assert
    Assert.AreEqual(expected, actual);
}

What I have found so far

RichTextBox

RichTextBox can be used for certain things. Example:

RichTextBox richTextBox = new RichTextBox();
richTextBox.Text = "Européen";
string rtfFormattedString = richTextBox.Rtf;

But then rtfFormattedString turns out to be the entire RTF-formatted document, not just the string "Europ\'e9en".

Stackoverflow

Google

I've also found a bunch of other resources on the web, but nothing quite solved my problem.

Answer

Brad Christie's answer

Had to add Trim() to remove the preceeding space in result. Other than that, Brad Christie's solution seems to work.

I'll run with this solution for now even though I have a bad gut feeling since we have to SubString and Trim the heck out of RichTextBox to get a RTF-formatted string.

Test case:

[TestMethod]
public void Test_To_Verify_Brad_Christies_Stackoverflow_Answer()
{
        Assert.AreEqual(@"Europ\'e9en", "Européen".ConvertToRtf());
        Assert.AreEqual(@"d\'e9finitif", "définitif".ConvertToRtf());
        Assert.AreEqual(@"\'e0", "à".ConvertToRtf());
        Assert.AreEqual(@"H\'e4user", "Häuser".ConvertToRtf());
        Assert.AreEqual(@"T\'fcren", "Türen".ConvertToRtf());
        Assert.AreEqual(@"B\'f6den", "Böden".ConvertToRtf());
}

Logic as an extension method:

public static class StringExtensions
{
    public static string ConvertToRtf(this string value)
    {
        RichTextBox richTextBox = new RichTextBox();
        richTextBox.Text = value;
        int offset = richTextBox.Rtf.IndexOf(@"\f0\fs17") + 8; // offset = 118;
        int len = richTextBox.Rtf.LastIndexOf(@"\par") - offset;
        string result = richTextBox.Rtf.Substring(offset, len).Trim();
        return result;
    }
}
Community
  • 1
  • 1
Lernkurve
  • 20,203
  • 28
  • 86
  • 118
  • possible duplicate of [Output RTF special characters to Unicode](http://stackoverflow.com/questions/1310694/output-rtf-special-characters-to-unicode) – Abe Miessler Jan 25 '11 at 16:05
  • @Abe Miessler: I had seen that question and have added the link to my question above. However, I don't quite see how that solves my problem (it probably does, but I don't get it). Could you perhaps provide a code snippet that makes the above test method past? – Lernkurve Jan 25 '11 at 16:12
  • Check out my answer again, I've posted a (hacky) solution to your question. My hope is you're only translating some minor/simpler things. – Brad Christie Jan 25 '11 at 16:18
  • @Lernkurve: Saw an up-vote for this and thought I'd follow-up; is this still working out well? (I'm a little curious how insecure this method was) – Brad Christie Oct 16 '12 at 15:19
  • @BradChristie: I can't really tell you how insecure this method is. We didn't check whether it works in every single possible case. But since we were unsure, we wrote tests for all the special characters we were interested in and it worked great for those ones. – Lernkurve Oct 18 '12 at 16:23

8 Answers8

9

Doesn't RichTextBox always have the same header/footer? You could just read the content based on off-set location, and continue using it to parse. (I think? please correct me if I'm wrong)

There are libraries available, but I've never had good luck with them personally (though always just found another method before fully exhausting the possibilities). In addition, most of the better ones are usually include a nominal fee.


EDIT
Kind of a hack, but this should get you through what you need to get through (I hope):

RichTextBox rich = new RichTextBox();
Console.Write(rich.Rtf);

String[] words = { "Européen", "Apple", "Carrot", "Touché", "Résumé", "A Européen eating an apple while writing his Résumé, Touché!" };
foreach (String word in words)
{
    rich.Text = word;
    Int32 offset = rich.Rtf.IndexOf(@"\f0\fs17") + 8;
    Int32 len = rich.Rtf.LastIndexOf(@"\par") - offset;
    Console.WriteLine("{0,-15} : {1}", word, rich.Rtf.Substring(offset, len).Trim());
}

EDIT 2

The breakdown of the codes RTF control code are as follows:

  • Header
    • \f0 - Use the 0-index font (first font in the list, which is typically Microsoft Sans Serif (noted in the font table in the header: {\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}))
    • \fs17 - Font formatting, specify the size is 17 (17 being in half-points)
  • Footer
    • \par is specifying that it's the end of a paragraph.

Hopefully that clears some things up. ;-)

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • Your method will fail if the header's length ever changes. My method will fail if the header ever includes the string, "!!@@!!" (but will not fail if the input string contains "!!@@!!"). – Brian Jan 25 '11 at 16:25
  • @Brian: My headers do change. the difference between outputting "Apple" and "Européen" make the header change. – Brad Christie Jan 25 '11 at 16:29
  • @Brad Christie: I mean if "\fo\fs17" ever changes. I admit this is unlikely. I guess I just have an aversion to depending on implementation details. – Brian Jan 25 '11 at 16:44
  • @Brad Christie: I've added your code to my post above and it seems to work. Thanks. :-) I'll report back after trying it out some more and figuring out what you are actually doing there. ;-) Perhaps you can help me out by explaining what @"\f0\fs17" and @"\par" stands for since I don't know the RTF specification. – Lernkurve Jan 25 '11 at 16:48
  • @Lernkurve: Updated the answer once again, please see the bottom. ;-) – Brad Christie Jan 25 '11 at 17:43
  • @Brad Christie: Thanks a lot for your Edit 2. I added an updated test case to my question above. I replaced my +9 again with your initial +8 since it didn't work for "à". But then I had to add a Trim() to keep it working for all the other cases. – Lernkurve Jan 26 '11 at 16:51
  • Not a problem, glad to help. And I thought there was a reason I used 8, just didn't trim it. (Both fixes have been applied to my answer) Also, I can say pretty confidently that this should work in a lot of scenarios. As long as you don't get in to carriage returns, this should fit your need(s). Hacky, yes, but given minimal effort/expense, I think this fits the bill. ;-) – Brad Christie Jan 26 '11 at 17:00
  • @Brad Christie: I absolutely agree with you, and you saying that it should work in a lot of scenarios makes me sleep better. ;-) P.S. Have another look at the test case in my answer: I added some extension method sweetness. – Lernkurve Jan 26 '11 at 17:32
5

This is how I went:

private string ConvertString2RTF(string input)
{
    //first take care of special RTF chars
    StringBuilder backslashed = new StringBuilder(input);
    backslashed.Replace(@"\", @"\\");
    backslashed.Replace(@"{", @"\{");
    backslashed.Replace(@"}", @"\}");

    //then convert the string char by char
    StringBuilder sb = new StringBuilder();
    foreach (char character in backslashed.ToString())
    {
        if (character <= 0x7f)
            sb.Append(character);
        else
            sb.Append("\\u" + Convert.ToUInt32(character) + "?");
    }
    return sb.ToString();
}

I think using a RichTextBox is:
1) overkill
2) I don't like RichTextBox after spending days of trying to make it work with an RTF document created in Word.

Beetee
  • 475
  • 1
  • 7
  • 18
5

I found a nice solution that actually uses the RichTextBox itself to do the conversion:

private static string FormatAsRTF(string DirtyText)
{
    System.Windows.Forms.RichTextBox rtf = new System.Windows.Forms.RichTextBox();
    rtf.Text = DirtyText;
    return rtf.Rtf;
}

http://www.baltimoreconsulting.com/blog/development/easily-convert-a-string-to-rtf-in-net/

Matthew Lock
  • 13,144
  • 12
  • 92
  • 130
1

I know it has been a while, hope this helps..

This code is working for me after trying every conversion code I could put my hands on:

titleText and contentText are simple text filled in a regular TextBox

var rtb = new RichTextBox();
rtb.AppendText(titleText)
rtb.AppendText(Environment.NewLine);
rtb.AppendText(contentText)

rtb.Refresh();

rtb.rtf now holds the rtf text.

The following code will save the rtf text and allow you to open the file, edit it and than load it back into a RichTextBox back again:

rtb.SaveFile(path, RichTextBoxStreamType.RichText);
Eibi
  • 402
  • 4
  • 17
1

Below is an ugly example of converting a string to an RTF string:

class Program
{
    static RichTextBox generalRTF = new RichTextBox();

    static void Main()
    {
        string foo = @"Européen";
        string output = ToRtf(foo);
        Trace.WriteLine(output);
    }

    private static string ToRtf(string foo)
    {
        string bar = string.Format("!!@@!!{0}!!@@!!", foo);
        generalRTF.Text = bar;
        int pos1 = generalRTF.Rtf.IndexOf("!!@@!!");
        int pos2 = generalRTF.Rtf.LastIndexOf("!!@@!!");
        if (pos1 != -1 && pos2 != -1 && pos2 > pos1 + "!!@@!!".Length)
        {
            pos1 += "!!@@!!".Length;
            return generalRTF.Rtf.Substring(pos1, pos2 - pos1);
        }
        throw new Exception("Not sure how this happened...");
    }
}
Brian
  • 25,523
  • 18
  • 82
  • 173
  • Thanks for taking the time to post the code. I'll have to have a look at it. The exclamation marks look scary... – Lernkurve Jan 25 '11 at 16:54
  • 2
    @Lernkurve: The "!!@@!!" is an arbitrary delimiter and really belongs in a `const String`. – Brian Jan 25 '11 at 16:59
  • Thanks for the explanation. Should have seen that myself. :-) – Lernkurve Jan 25 '11 at 17:03
  • 1
    @Lernkurve: Basically, my solution and Brad's solution are the same, but brad uses the naturally occurring text of the `RichTextBox` control to delimit the text whereas I produce my own delimiter. I also added some error-checking in case "impossible" conditions occurred. It's debatable whether this checking if appropriate. – Brian Jan 25 '11 at 17:18
1

Here's improved @Vladislav Zalesak's answer:

public static string ConvertToRtf(string text)
{
    // using default template from wiki
    StringBuilder sb = new StringBuilder(@"{\rtf1\ansi\ansicpg1250\deff0{\fonttbl\f0\fswiss Helvetica;}\f0\pard ");
    foreach (char character in text)
    {
        if (character <= 0x7f)
        {
            // escaping rtf characters
            switch (character)
            {
                case '\\':
                case '{':
                case '}':
                    sb.Append('\\');
                    break;
                case '\r':
                    sb.Append("\\par");
                    break;
            }

            sb.Append(character);
        }
        // converting special characters
        else
        {
            sb.Append("\\u" + Convert.ToUInt32(character) + "?");
        }
    }
    sb.Append("}");
    return sb.ToString();
}
Gh61
  • 9,222
  • 4
  • 28
  • 39
0

Not the most elegant, but quite optimal and fast method:

public static string PlainTextToRtf(string plainText)
{
    if (string.IsNullOrEmpty(plainText))
        return "";

    string escapedPlainText = plainText.Replace(@"\", @"\\").Replace("{", @"\{").Replace("}", @"\}");
    escapedPlainText = EncodeCharacters(escapedPlainText);

    string rtf = @"{\rtf1\ansi\ansicpg1250\deff0{\fonttbl\f0\fswiss Helvetica;}\f0\pard ";
    rtf += escapedPlainText.Replace(Environment.NewLine, "\\par\r\n ") + ;
    rtf += " }";
    return rtf;
}

.

Encode characters (Polish ones) method:

private static string EncodeCharacters(string text)
{
    if (string.IsNullOrEmpty(text))
        return "";

    return text
        .Replace("ą", @"\'b9")
        .Replace("ć", @"\'e6")
        .Replace("ę", @"\'ea")
        .Replace("ł", @"\'b3")
        .Replace("ń", @"\'f1")
        .Replace("ó", @"\'f3")
        .Replace("ś", @"\'9c")
        .Replace("ź", @"\'9f")
        .Replace("ż", @"\'bf")
        .Replace("Ą", @"\'a5")
        .Replace("Ć", @"\'c6")
        .Replace("Ę", @"\'ca")
        .Replace("Ł", @"\'a3")
        .Replace("Ń", @"\'d1")
        .Replace("Ó", @"\'d3")
        .Replace("Ś", @"\'8c")
        .Replace("Ź", @"\'8f")
        .Replace("Ż", @"\'af");
}
Chris W
  • 1,562
  • 20
  • 27
  • With your solution, how do you know that you have covered all possible special characters? If a character is not in your EncodeCharacters' Replace-list, then it will turn out wrong, right? – Lernkurve Jan 31 '13 at 19:08
  • In some rare exceptions this little hack can be a last resort. – Jan and RESTless May 12 '21 at 16:39
0
private static string ConvertToRtf(string text)
{
    // Create a regular expression pattern to match non-ASCII characters
    string pattern = "[^\x00-\x7F]";
    // Use Regex.Replace to escape non-ASCII characters
    return Regex.Replace(text, pattern, m => m.Value[0] > 255 ? @"\u" + ((int)m.Value[0]).ToString() + "?" : @"\'" + ((int)m.Value[0]).ToString("X2").ToLowerInvariant());
}
Jaribal
  • 1
  • 2
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 01 '23 at 16:31