I'm trying to read text and images from a Word document and close it. The problem is trying to close it without Word encountering any issues OR creating multiple WINWORD.exe instances. My problem is that when I call Marshal.FinalReleaseComObject(app);
on the Word.ApplicationClass
, Word fires a generic exception provided by Windows ("Word has stopped working"). I have read many of the solutions in How do I properly clean up Excel interop objects? and implemented the recommendations, but I still have the issue.
Here is my code. I am only reading one Word file with one page (you may want to skip to "// Cleanup:" where the exception occurs).
private byte[] GetDocumentText(byte[] wordBytes, string path)
{
// Save bytes to word file in temp dir, open, copy info. Then delete the temp file after.
object x = Type.Missing;
string ext = Path.GetExtension(path).ToLower();
string tmpPath = Path.ChangeExtension(Path.GetTempFileName(), ext);
File.WriteAllBytes(tmpPath, wordBytes);
// Open temp file with Excel Interop:
Word.ApplicationClass app = new Word.ApplicationClass();
Word.Documents docs = app.Documents;
Word.Document doc = docs.Open(tmpPath, x, x, x, x, x, x, x, x, x, x, x, x, x, x);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data = Clipboard.GetDataObject();
string documentText = data.GetData(DataFormats.Text).ToString();
// Add text to pages.
byte[] wordDoc = null;
using (MemoryStream myMemoryStream = new MemoryStream())
{
Document myDocument = new Document();
PdfWriter myPDFWriter = PdfWriter.GetInstance(myDocument, myMemoryStream); // REQUIRED.
PdfPTable table = new PdfPTable(1);
myDocument.Open();
// Create a font that will accept unicode characters.
BaseFont bfArial = BaseFont.CreateFont(@"C:\Windows\Fonts\Arial.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font arial = new Font(bfArial, 12);
// If Hebrew character found, change page direction of documentText.
PdfPCell page = new PdfPCell(new Paragraph(documentText, arial)) { Colspan = 1 };
Match rgx = Regex.Match(documentText, @"\p{IsArabic}|\p{IsHebrew}");
if (rgx.Success) page.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
table.AddCell(page);
// Add image to document (Not in order with text...)
foreach (Word.InlineShape ils in doc.InlineShapes)
{
if (ils != null && ils.Type == Word.WdInlineShapeType.wdInlineShapePicture)
{
PdfPCell imageCell = new PdfPCell();
ils.Select();
doc.ActiveWindow.Selection.Copy();
System.Drawing.Image img = Clipboard.GetImage();
byte[] imgb = null;
using (MemoryStream ms = new MemoryStream())
{
img.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
imgb = ms.ToArray();
}
Image wordPic = Image.GetInstance(imgb);
imageCell.AddElement(wordPic);
table.AddCell(imageCell);
}
}
myDocument.Add(table);
myDocument.Close();
myPDFWriter.Close();
wordDoc = myMemoryStream.ToArray();
}
// Cleanup:
Clipboard.Clear();
(doc as Word._Document).Close(Word.WdSaveOptions.wdDoNotSaveChanges, x, x);
Marshal.FinalReleaseComObject(doc);
Marshal.FinalReleaseComObject(docs);
(app as Word._Application).Quit(x, x, x);
Marshal.FinalReleaseComObject(app); // Word encounters exception here.
doc = null;
docs = null;
app = null;
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
try { File.Delete(tmpPath); }
catch { }
return wordDoc;
}
This doesn't always happen the first time I read the file. When I read it a second or third time, I usually get the error.
Is there any way I can prevent the error from showing?