1

I'm using Visual Studio 2010 .NET4.0 and trying to extract text from a word document using the Open XML SDK 2.5 The tools it provides (WindowsBase and DocumentFormat.OpenXml) are referenced in my current solution.

Although I have referenced both WindowsBase and DocumentForm.OpenXml I cannot use SPFile.

For reference, I'm trying to implement @KyleM's solution on this SOF thread: How to extract text from MS office documents in C#

Also I've added a using statement for both DocumentForm.OpenXml; and System.IO.Packaging;

Community
  • 1
  • 1
  • 1
    Possibly enough to pass the fullname of your file as a string (e.g. change the parameter to String fullname, then use (WordprocessingDocument wdDoc = WordprocessingDocument.Open(full name, false)). SPFile is a class in one of the SharePoint libraries - probably not what you need if you are not using SharePoint. –  Jul 15 '13 at 07:43
  • 1
    Yeah, I did some further investigation last night and found out that the dll I would need is part of Share Point. Also apparently it will not compile correctly unless your actually on a server. I'm deploying this application on a desktop so it shouldn't be an issue. I'll try out what you said when I get home tonight. – Thomas Robert Horn Jul 15 '13 at 22:18

1 Answers1

0

Just for the record, the suggestions from @bibadia produced this:

using DocumentFormat.OpenXml.Packaging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Web;
using System.Xml;

namespace MyProject.Helpers
{
    public static class WordHelper
    {
        public static string TextFromWord(String fileName)
        {
            const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

            StringBuilder textBuilder = new StringBuilder();
            using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(fileName, false))
            {
                // Manage namespaces to perform XPath queries.  
                NameTable nt = new NameTable();
                XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
                nsManager.AddNamespace("w", wordmlNamespace);

                // Get the document part from the package.  
                // Load the XML in the document part into an XmlDocument instance.  
                XmlDocument xdoc = new XmlDocument(nt);
                xdoc.Load(wdDoc.MainDocumentPart.GetStream());

                XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager);
                foreach (XmlNode paragraphNode in paragraphNodes)
                {
                    XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsManager);
                    foreach (System.Xml.XmlNode textNode in textNodes)
                    {
                        textBuilder.Append(textNode.InnerText);
                    }
                    textBuilder.Append(Environment.NewLine);
                }

            }
            return textBuilder.ToString();
        }
    }
}
Community
  • 1
  • 1
Leonel Sanches da Silva
  • 6,972
  • 9
  • 46
  • 66