4

I have on my database a column that holds text in RTF format. How can I get only the plain text of it, using Java?

Programmer
  • 49
  • 1
  • 3
  • 8
  • You might find [this](http://stackoverflow.com/questions/10317030/java-rtf-import-edit-and-export-possible) interesting. – assylias Aug 08 '12 at 14:41
  • 1
    Possible duplicate of: [regular-expression-for-extracting-text-from-an-rtf-string][1] [1]: http://stackoverflow.com/questions/188545/regular-expression-for-extracting-text-from-an-rtf-string – John Smith Aug 08 '12 at 14:43

4 Answers4

2
RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());

this should work

Ben Arnao
  • 492
  • 5
  • 11
0

If you can try "AdvancedRTFEditorKit", it might be cool. Try here http://java-sl.com/advanced_rtf_editor_kit.html

I have used it to create a complete RTF editor, with all the supports MS Word has.

PeakGen
  • 21,894
  • 86
  • 261
  • 463
  • I need a simple parser that get a string like this : {\rtf1\fbidis\ansi\ansicpg1255\deff0\deflang1037{\fonttbl{\f0\fnil\fcharset0 Tahoma;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\ltrpar\cf1\f0\fs18 2134\par } and returning the plain text: '2134'. its not from a file. it's a simple VARCHAR2(4000) column that mapped to a string field. – Programmer Aug 09 '12 at 09:34
0

Apache POI will also read Microsoft Word formats, not just RTF.

POI

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public String getRtfText(String fileName) {
   File rtfFile = null;
   WordExtractor rtfExtractor = null ;

   try {
    rtfFile = new File(fileName);

    //A FileInputStream obtains input bytes from a file.
    FileInputStream inStream = new FileInputStream(rtfFile.getAbsolutePath());

    //A HWPFDocument used to read document file from FileInputStream
    HWPFDocument doc=new HWPFDocument(inStream);

    rtfExtractor = new WordExtractor(doc);
   }
   catch(Exception ex)
   {
    System.out.println(ex.getMessage());
   }

    //This Array stores each line from the document file.
    String [] rtfArray = rtfExtractor.getParagraphText();

    String rtfString = "";

    for(int i=0; i < rtfArray.length; i++) rtfString += rtfArray[i];

    System.out.println(rtfString);
    return rtfString;
 }
Mike
  • 3,186
  • 3
  • 26
  • 32
  • I need something like this as we have in C#: static public string ConvertToText(string rtf) { RichTextBox rtb = new RichTextBox(); rtb.Rtf = rtf; return rtb.Text; } – Programmer Aug 09 '12 at 09:37
  • 1
    This does NOT work. POI does not parse RTF documents. (I tried it and got an exception saying POI does not parse RTF documents!) – Mary Jun 20 '13 at 21:02
  • Yes, I downloaded the latest POI from apache and the entire package hwpf does not exist. – george_h Dec 19 '13 at 08:51
  • sure it does, it is an subcomponent for Word (HWPF+XWPF) http://poi.apache.org/hwpf/index.html – Bernhard Feb 18 '14 at 10:16
  • 3
    The above code doesn't open rtf files. It throws the "java.lang.IllegalArgumentException: The document is really a RTF file" exception. So I assume that POI doesn't support opening rtf files. – Alex Lipov Sep 01 '14 at 14:46
0

This works if the RTF text is in a JEditorPane

String s = getPlainText(aJEditorPane.getDocument());

String getPlainText(Document doc) {
    try {
        return doc.getText(0, doc.getLength());
    }
    catch (BadLocationException ex) {
        System.err.println(ex);
        return null;
    }
}
Jens S.
  • 1
  • 3