0

Question :

I need to read an RTF File that contains an OLE Object as innerdocument.

RTF File = [ Ole object (word document) is embedded into it.]

Sample RTF File that contains word as OLE Embedded into it.

Reference I have done :

  1. OLE as Image in RTF

Here they have done a program to extract the image embedded as OLE in RTF.

I had extracted the program which is marked as correct answer , but its does not work for me.

  1. Using OpenXML SDK. (it cannot be able to open RTF Files.)

  2. some other SDK like GemBox etc.. Which cannot be able to open innerdocument ie. ole in RTF)

Work I have done :

I had done using microsoft.office.interop.word.dll which gives an accurate answer , but it will not work on server.

For eg: it opens an RTF File using MS WORD and which is installed in client machine where there is no WORD APPLICATION Installed in server.

so , this is not suitable for me.

I need to open and read the RTF OLE Content and i need to store in a string(say for eg). bcoz with string i can do lot of things.

Can anyone has an idea to solve my issue.?

Sivabalakrishnan
  • 475
  • 1
  • 7
  • 23
  • Your .rtf file doesn't contain an OLE `package` object (like in my previous answer), but a `Word.Document.12` object (.RTF is a text format underneath). Just remove the test for "package" in the sample code so you'll get data from GetNextTextAsByteArray as a byte[]. From this data, in the Open Xml (.zip format) case, just look for the first 'PK' string (or 0x50 0x4B in byte hex) and this will be the start of the .docx or other document. – Simon Mourier Oct 22 '18 at 13:15
  • My input is an OLE File , ole is unable to read and opened using any application , so i append a RTF Header to the OLE File and made it as an RTF File. Now its able to open using MS_WORD. The given File is edited by me . File is same like ur case and instead of image ,i have a word document attached to it. @SimonMourier – Sivabalakrishnan Oct 23 '18 at 05:46
  • OLE is a wide term. Your input is not an ole *package*, it's an ole *word.document.12* which is different than in my answer. – Simon Mourier Oct 23 '18 at 05:53
  • okay. whats the possible way to do this ? Any package that helps to read it ? @SimonMourier – Sivabalakrishnan Oct 23 '18 at 08:44
  • I told you how to extract the .docx file – Simon Mourier Oct 23 '18 at 08:54

2 Answers2

2

Please use the following code example to extract the OLE object (Word document) from RTF and import it into Aspose.Words’ DOM to read its content. Hope this helps you.

Document doc = new Document(MyDir + "SAMPLE.rtf");

Shape shape = (Shape)doc.GetChild(NodeType.Shape, 0, true);
if (shape.OleFormat != null)
{
    //Save the document to disk.
    shape.OleFormat.Save(MyDir + "output" + shape.OleFormat.SuggestedExtension);

    if (shape.OleFormat.SuggestedExtension == ".docx")
    {
        //Import the .docx ole object into Aspose.Words' DOM
        Document ole = new Document(MyDir + "output" + shape.OleFormat.SuggestedExtension);
        Console.WriteLine(ole.ToString(SaveFormat.Text));
    }

}

I work with Aspose as Developer Evangelist.

Tahir Manzoor
  • 597
  • 2
  • 9
0

Thanks for the above answer. Here is another version of the code which iterates and saves all the OLE's with the original file name in a local path.

string MyDir = @"E:\temp\";
            Document doc = new Document(MyDir + "Requirement#4.rtf");

            NodeCollection nodeColl = doc.GetChildNodes(NodeType.Shape, true);
            foreach (var node in nodeColl)
            {
                Shape shape1 = (Shape)node;
                if (shape1.OleFormat != null)
                {
                    shape1.OleFormat.Save(MyDir + shape1.OleFormat.SuggestedFileName + shape1.OleFormat.SuggestedExtension);
                }
            }