I am writing some C# VSTO code that reads a Microsoft Word document and saves it to Filtered HTML. When I perform this function on a generic Word document, the output of the html file uses a Windows Charset as witnessed here:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
If I open a document and go to File->Options->Advanced->Web Options, I can choose UTF8, and the resulting filtered html document output looks like this:
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
I want to write c# code that saves any Word document to filtered html with utf-8. After doing some research, I found some people saying the "SaveAs2" function does not work (even though Microsoft documents it as a feature). That means, this code does not work for me:
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML, Encoding: "65001");
(note: I tried putting the 65001 in quotes and without quotes.. neither throw errors, but neither works).
Next, I moved on to setting the web options for the document like this:
doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2(destFile, MsWord.WdSaveFormat.wdFormatFilteredHTML);
To the best of my knowledge the above code performs the same exact function as my manually opening a file, going to file->options..., setting to UTF-8 and saving the file to filtered html, yet the output still looks like this:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
Is there a way to force Microsoft Word to output a file to UTF-8 without having to manually configure the document first?