1

I have an RTF file with a content like this:

{\object\objemb{\*\objclass Excel.Sheet.12}\objw8415\objh3015{\*\objdata 
01050000
02000000
0f000000...}}}

(may be Excel or Word)

What I need is to extract the \objdata part into an external file to be able to edit it. After that, the file shall be converted back to an embedded object in an RTF file.

I already searched around, and it seems that this is not a trivial problem. From this post and with a small modification, I tried to get access to the objdata and to save it to file, but this does not lead to a valid Excel file:

if (RtfReader.MoveToNextControlWord(enumerator, "objdata"))
{
    byte[] data = RtfReader.GetNextTextAsByteArray(enumerator);
    using (MemoryStream packageData = new MemoryStream())
    {
        RtfReader.ExtractObjectData(new MemoryStream(data), packageData);
        File.WriteAllBytes(@"c:\temp\some-excel.xls", ReadToEnd(packageData));
    }
}

Are there any ideas out there how to achieve the mentioned goals?

Thanks a lot in advance for any help!

Community
  • 1
  • 1
Hardy
  • 4,344
  • 3
  • 17
  • 27

1 Answers1

2

In this case, the content of the objdata is a Compound File. You can spot the famous 'd0cf11e0' header (looks like "docfile"). More on this here: Developing a tool to recognise MS Office file types ( .doc, .xls, .mdb, .ppt ).

I have written a small example that you can use to extract the data. You can use it like this:

        string ole = "2090_Object_Text_0.ole"; // your file
        string text = File.ReadAllText(ole);
        DocFile.Save(text, "mydoc.doc"); // you should adapt this depending on the object class (Word.Document.8 is a .doc).

And the DocFile helper code:

public static class DocFile
{
    // magic Doc File header
    // check this for more: http://social.msdn.microsoft.com/Forums/en-US/343d09e3-5fdf-4b4a-9fa6-8ccb37a35930/developing-a-tool-to-recognise-ms-office-file-types-doc-xls-mdb-ppt-
    private const string Header = "d0cf11e0";

    public static void Save(string text, string filePath)
    {
        if (text == null)
            throw new ArgumentNullException("text");

        if (filePath == null)
            throw new ArgumentNullException("filePath");

        int start = text.IndexOf(Header);
        if (start < 0)
            throw new ArgumentException(null, "Text does not contain a doc file.");

        int end = text.IndexOf('}', start);
        if (end < 0)
        {
            end = text.Length;
        }

        using (MemoryStream bytes = new MemoryStream())
        {
            bool highByte = true;
            byte b = 0;
            for (int i = start; i < end; i++)
            {
                char c = text[i];
                if (char.IsWhiteSpace(c))
                    continue;

                if (highByte)
                {
                    b = (byte)(16 * GetHexValue(c));
                }
                else
                {
                    b |= GetHexValue(c);
                    bytes.WriteByte(b);
                }
                highByte = !highByte;
            }
            File.WriteAllBytes(filePath, bytes.ToArray());
        }
    }

    private static byte GetHexValue(char c)
    {
        if (c >= '0' && c <= '9')
            return (byte)(c - '0');

        if (c >= 'a' && c <= 'f')
            return (byte)(10 + (c - 'a'));

        if (c >= 'A' && c <= 'F')
            return (byte)(10 + (c - 'A'));

        throw new ArgumentException(null, "c");
    }
}
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • Cool - actually much simpler as the solutions I (unsuccessfully) tried before. And, it works! Thanks a lot! – Hardy Oct 02 '13 at 09:11
  • By the way: how does the other way round work? I.e., to create an .ole file from an existing .doc? Do I have to take a Stream, read the original file, convert the bytes into a textual representation, and wrap the {\...} staff around? – Hardy Oct 02 '13 at 09:15
  • Yes, but don't forget the small header that's before the `d0cf11e0` marker. – Simon Mourier Oct 02 '13 at 09:28
  • How important are the line breaks that exist in the original .ole file, but that do currently not exist in the binary part of the created .ole file? In other words: is it necessary to create 64-byte-portions of the binary part with linebreaks to have a valid .ole? – Hardy Oct 02 '13 at 10:19
  • No it's not important, its just more readable in a standard text editor. If you really need an insight on this, the RTF specification is public :-) – Simon Mourier Oct 02 '13 at 12:07