2

Question:

How do I convert an RTF string to a Markdown string (and back) either in C# or JS, ideally without wrapping an exe?


I have a legacy product that uses .NET's RichTextBox control. Forms that use it save their output in Microsoft's proprietary RTF format. Here is a small example of the output it can generate:

{\\rtf1\\ansi\\ansicpg1252\\uc1\\htmautsp\\deff2{\\fonttbl{\\f0\\fcharset0 Times New Roman;}{\\f2\\fcharset0 GenericSansSerif;}}{\\colortbl\\red0\\green0\\blue0;\\red255\\green255\\blue255;}\\loch\\hich\\dbch\\pard\\plain\\ltrpar\\itap0{\\lang1033\\fs18\\f2\\cf0 \\cf0\\ql{\\f2 {\\ltrch Some content here }\\li0\\ri0\\sa0\\sb0\\fi0\\ql\\par}\r\n}\r\n}

My C# .NET Core Web App needs to be able to use this stored RTF to display a "Rich Text Editor" on a web page, have the ability to update the value, and save in a format that can still be used by the legacy product.

Unfortunately, I am having trouble finding existing/modern web components that can use RTF as input. Most appear to use markdown or a custom JSON format.

Ideally, I would like to:

  1. Convert the existing RTF to Markdown using either:
    • Server side, using C#
    • Client side, using JS
  2. Use the markdown with one of the existing Rich Text Editing web components I've found.
  3. On save, convert the web component's markdown to RTF before persisting

So far, I have tried:

  • Following this CodeProject write-up for creating a custom RTF -> HTML converter: Writing Your Own RTF Converter
    • I can get it to work in a .NET Framework project, but not .NET Core
  • Using this NuGet Package: RtfPipe
    • Throws null reference errors in .NET Core projects
  • Using this Node Module: rtf-to-html
    • Only support a small subset of RTF, creates an entire HTML document instead of a string/subset, breaks on my specific example

Note: The things I've tried are from RTF -> Html because I couldn't find anything for RTF -> Markdown specifically. My hope was that I could, if I had to, do: RTF -> HTML -> Markdown (and in reverse) as a last resort.

Daniel Brown
  • 2,942
  • 5
  • 29
  • 41
  • 3
    Have you tried calling out to a standalone tool like `pandoc`? – omajid Sep 08 '17 at 17:30
  • I have this solution bookmarked as a last resort: https://stackoverflow.com/questions/6119793/convert-html-or-rtf-to-markdown-or-wiki-compatible-syntax -- I'm worried that using it would limit me to deploying to Win environments, and I'd much rather rely on a node module or NuGet package. – Daniel Brown Sep 08 '17 at 17:36
  • What is your actual question? You present your situation, but I don't actually see a question in the title or body. – Waylan Sep 08 '17 at 19:39
  • @Waylan, I'll make an edit to make it more explicit. I'm looking for help in converting an RTF string to a Markdown string (and back) either in C# or JS, ideally without wrapping an exe. – Daniel Brown Sep 08 '17 at 19:41
  • Are you looking for a tool recommendation? Because that is off-topic. – Waylan Sep 08 '17 at 19:43
  • Any method. A path to implement myself in JS or C# (regex, set of steps, etc?), a NuGet or NPM package that accomplishes the task, or any other method I may not have thought of or be aware of. – Daniel Brown Sep 08 '17 at 19:45
  • @daniel if a subset of RTF is not enough for you, do you think you will have everything you need with markdown? Does web SERVER really need to be portable? If not you can have ASP.NET Core targeting .NET Framework – Adriano Repetti Sep 08 '17 at 19:46
  • @AdrianoRepetti, maybe I should revise? The parser/converter breaks because it encounters tags it does not support. I only need to preserve formatting that would also exist in markdown. Anything else can be stripped/trashed. But yes, markdown does support everything I need to account for. – Daniel Brown Sep 08 '17 at 19:49
  • "*Any method.*" Well, then your question may be too broad. – Waylan Sep 08 '17 at 19:49
  • @Waylan, I know what my problem is "I'm unable to convert an RTF string to markdown and back to RTF". I've listed the methods I have tried (packages and custom code) and the languages I am working with. I can narrow my question if you feel it is too broad -- Not trying to argue; honestly seeking guidance. – Daniel Brown Sep 08 '17 at 19:58
  • @AdrianoRepetti, application is deployed on a windows server for global data access, but also exists across a couple hundred client servers for local data access. Roughly 10 or so are Linux based. – Daniel Brown Sep 08 '17 at 20:01
  • 2
    @DanielBrown, Pandoc is available on several platforms, not just Windows. You wouldn't be restricting yourself to Windows-based deployments by using it. – ChrisGPT was on strike Sep 08 '17 at 21:47

1 Answers1

6

Sorry for the null reference errors you had with RtfPipe and .Net Core. A resolution to these errors is now documented on the project and involves including the NuGet package System.Text.Encoding.CodePages and registering the code page provider.

#if NETCORE
  // Add a reference to the NuGet package System.Text.Encoding.CodePages for .Net core only
  Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
#endif
var html = Rtf.ToHtml(rtf);

Since HTML is technically Markdown, you can stop here. Otherwise, you can convert the HTML to Markdown as well using my BracketPipe library. The code would look something like.

using BracketPipe;
using RtfPipe;

private string RtfToMarkdown(string source)
{
  using (var w = new System.IO.StringWriter())
  using (var md = new MarkdownWriter(w))
  {
    Rtf.ToHtml(source, md);
    md.Flush();
    return w.ToString();
  }
}

Markdig is a good library for getting from Markdown to HTML. However, I don't have any good suggestions for getting from HTML to RTF.

Disclaimer: I am the author of the RtfPipe and BracketPipe open source projects

erdomke
  • 4,980
  • 1
  • 24
  • 30