4

I am writing some C# VSTO code that reads a Microsoft Word document and saves it to Filtered HTML. When I perform this function on a generic Word document, the output of the html file uses a Windows Charset as witnessed here:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

If I open a document and go to File->Options->Advanced->Web Options, I can choose UTF8, and the resulting filtered html document output looks like this:

<meta http-equiv=Content-Type content="text/html; charset=utf-8">

I want to write c# code that saves any Word document to filtered html with utf-8. After doing some research, I found some people saying the "SaveAs2" function does not work (even though Microsoft documents it as a feature). That means, this code does not work for me:

doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML, Encoding: "65001");

(note: I tried putting the 65001 in quotes and without quotes.. neither throw errors, but neither works).

Next, I moved on to setting the web options for the document like this:

doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2(destFile, MsWord.WdSaveFormat.wdFormatFilteredHTML);

To the best of my knowledge the above code performs the same exact function as my manually opening a file, going to file->options..., setting to UTF-8 and saving the file to filtered html, yet the output still looks like this:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

Is there a way to force Microsoft Word to output a file to UTF-8 without having to manually configure the document first?

Bill
  • 582
  • 1
  • 7
  • 21
  • http://stackoverflow.com/questions/5568033/convert-a-strings-character-encoding-from-windows-1252-to-utf-8 – user3648716 Dec 02 '15 at 07:27
  • 1
    Using `doc.WebOptions.Encoding = msoEncodingUTF8` works for me. Are you sure you are checking the correct file? – Dirk Vollmar Dec 02 '15 at 15:58
  • @DirkVollmar, I have been trying this and it does not work. Then, I did a Doc.Fields.Update() before and it worked. Odd. – Bill Dec 02 '15 at 16:12

2 Answers2

5

At the time of this writing, it is unclear whether I have encountered a bug with my specific version of Microsoft Word (Word Online) or the VSTO template, but I will answer what made this work for me here.

If this code does not work:

doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);

Then, change the code to refresh the document's fields, like this:

doc = app.Documents.Open("C:\\Temp\\Test.docx");

doc.Fields.Update(); // ** this is the new line of code.

doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);
Bill
  • 582
  • 1
  • 7
  • 21
0

In Visual Studio you have to add a reference:

In your project:

'MyProject' > Solution Explorer > References > Add Reference > COM > Microsoft Office 16.0 Object library

VS gives a very cryptic error that does not really help to find the lib.