2

Input are Excel files - the cells may contain some basic HTML formatting like <b>, <br>, <h2>.

I want to read the strings and insert the text as formatted text into word documents, i.e. <b>Foo</b> would be shown as a bold string in Word.

I don't know which tags are used so I need a "generic solution", a find/replace approach does not work for me.

I found a solution from January 2011 using the WebBrowser component. So the HTML is converted to RTF and the RTF is inserted into Word. I was wondering if there is a better solution today.

Using a commercial component is fine for me.

Update

I came across Matthew Manela's MarkupConverter class. It converts HTML to RTF. Then I use the clipboard to insert the snippet into the word file

// rtf contains the converted html string using MarkupConverter
Clipboard.SetText(rtf, TextDataFormat.Rtf);
// objTable is a table in my word file
objTable.Cell(1, 1).Range.Paste();

This works, but will copy/pasting up to a few thousand strings using the clipboard break anything?

Community
  • 1
  • 1
herrjeh42
  • 2,782
  • 4
  • 35
  • 47
  • Do you need to use Office Interop, or would OpenXML be fine too? – flipchart May 02 '13 at 10:04
  • I need to insert my strings into Word tables and measure the height of the table cells. Does this work with OpenXML, too? – herrjeh42 May 02 '13 at 10:41
  • OpenXML can be used to manipulate docx files, including inserting HTML (into tables). After the document is built you can measure the heights using office interop. I'm not sure if OpenXML would be able to give you correct height. I'll put up an example later today – flipchart May 02 '13 at 11:15
  • that sounds better than my copy-paste approach :-) – herrjeh42 May 02 '13 at 20:04

3 Answers3

3

You will need the OpenXML SDK in order to work with OpenXML. It can be quite tricky getting into, but it is very powerful, and a whole lot more stable and reliable than Office Automation or Interop.

The following will open a document, create an AltChunk part, add the HTML to it, and embed it into the document. For a broader overview of AltChunk see Eric White's blog

using (var wordDoc = WordprocessingDocument.Open("DocumentName.docx", true))
{
    var altChunkId = "AltChunkId1";
    var mainPart = wordDoc.MainDocumentPart;

    var chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
    using (var textStream = new MemoryStream())
    {
        var html = "<html><body>...</body></html>";
        var data = Encoding.UTF8.GetBytes(html);
        textStream.Write(data, 0, data.Length);
        textStream.Position = 0;
        chunk.FeedData(textStream);
    }

    var altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document.Body.InsertAt(altChunk, 0);
    mainPart.Document.Save();
}

Obviously for your case, you will want to find (or build) the table you want and insert the AltChunk there instead of at the first position in the body. Note that the HTML that you insert into the word doc must be full HTML documents, with an <html> tag. I'm not sure if <body> is required, but it doesn't hurt. If you just have HTML formatted text, simply wrap the text in these tags and insert into the doc.

It seems that you will need to use Office Automation/Interop to get the table heights. See this answer which says that the OpenXML SDK does not update the heights, only Word does.

Community
  • 1
  • 1
flipchart
  • 6,548
  • 4
  • 28
  • 53
2

Use this code it is working..

Response.AppendHeader("content-disposition", "attachment;filename=FileEName.xls");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.ContentType = "application/vnd.ms-excel";
this.EnableViewState = false;
//Response.Write("Your HTML Code");
Response.Write("<table border='1 px solid'><tr><th>sfsd</th><th>sfsdfssd</th></tr><tr>
<td>ssfsdf</td><td><table border='1 px solid'><tr><th>sdf</th><th>hhsdf</th></tr><tr>
<td>sdfds</td><td>sdhjhfds</td></tr></table></td></tr></table>");
Response.End();
Esha Garg
  • 144
  • 8
1

Why not let WORD do its owns translation since it understands HTML.

  1. Read your Excel cells
  2. Write your values into a HTML textfile as it would be a WORD document.
  3. Open WORD and let it read that HTML file.
  4. Instruct WORD to save the document as a new WORD document (if that is required).
Martin Mulder
  • 12,642
  • 3
  • 25
  • 54
  • Hm... here's the context of my question: after inserting the formatted strings into Word tables I will measure the height of the cells in Word and save this information somewhere. I will do it again for other versions of the same string. Then I will manipulate a word template and set the table cell to the max height measured before. Working with the HTML import and having everything in "unstructured" html adds complexity, doesn't it? – herrjeh42 May 01 '13 at 13:53
  • Yes it does add complexity. But sometimes a problem does not always have an easy solution :) – Martin Mulder May 01 '13 at 13:56
  • It's the question of "return on invest" - what do you get in return of having a more complex solution, i.e. copy-paste vs HTML... thanx anyway :-) – herrjeh42 May 01 '13 at 18:06