1

Is there an easy way in C++ to tell if a RTF text string has any content, aside pure formatting.

For example this text is only formatting, there is no real content here:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}

Loading RTF text in RichTextControl is not an option, I want something that will work fast and require minimum resources.

riot_starter
  • 1,218
  • 15
  • 28

1 Answers1

0

The only sure-fire way is to write your own RTF parser [spec], use a library like LibRTF, or you might consider keeping a RichTextControl open and updating it with new RTF documents rather than destroying the object every time.

I believe RTF is not a regular language, so cannot be properly parsed by RegEx (not unlike HTML, despite millions of attempts to do so), but you do not need to write a complete RTF parser. I'd start with a simple string parser. Try:

  1. Remove content between {\ and }
  2. Remove tags. Tags begin with a backslash, \, and are followed by some text. If a backslash is followed by whitespace, it is not a tag.
  3. The document should end with at least one closing curly brace, }

Any content left which isn't whitespace should be document content, though this may have some exceptions so you'll want to test on numerous samples of RTF.

Charles Burns
  • 10,310
  • 7
  • 64
  • 81
  • Good pointer to LibRTF. From some of the samples on http://search.cpan.org/~sburke/RTF-Writer/lib/RTF/Cookbook.pod#RTF_Document_Structure, I don't think steps 1 and 2 of your proposed parser are correct and would skip non-format content. – holtavolt Mar 23 '12 at 15:16
  • @holtavolt: I suspect there are many RTF samples where the simple rules listed wouldn't work. It may be a good ruleset to get the 80/20 return on time and further refine from there. – Charles Burns Mar 23 '12 at 17:23
  • I am not entirely satisfied with this answer, but after looking further in other questions and web resources, it seems that there is just no better way to do it, so I will accept it.. – riot_starter Mar 24 '12 at 21:29