7

I'm working on a project (asp.net, c#, vb 2010, .net 4) and I need to read both DOC and DOCX files, that I've previosly uploaded (I've done uploading part). Tricky part is that I don't have MS Office installed on server and that I can't use it.

Is there any public library that I can include into my project without having to install anything? Both docs are very simple:

NUMBER TAB STRING  
NUMBER TAB STRING  
NUMBER TAB STRING  
...  

I need to extract number and string for each row (paragraph).

May someone help with this? I should repeat once again that I'm limited in a way that I can't install anything on a server.

laxonline
  • 2,657
  • 1
  • 20
  • 37
user1999722
  • 161
  • 1
  • 2
  • 3
  • 1
    doc AND docx? docx is a zip style compressed archive of XML documents and possibl binary (if images are in there etc.), doc is binary coded - totally different engine needed. – TomTom Jan 22 '13 at 09:30
  • For DOCX there are free and commercial libraries BUT for DOC the only options I know of are commercial... Is a commercial library an option ? – Yahia Jan 22 '13 at 09:33
  • @TomTom Yes I know that there is different background for doc and docx, but i'm interested if that can be somehow merged into one library? Or are there maybe two libraries that I can later merge on my own... Thanks – user1999722 Jan 22 '13 at 09:58
  • 2 libraries definitely. And likely a Commercial one for .doc – TomTom Jan 22 '13 at 10:20
  • You totally do not need a commercial library. Well, unless you really can't install *anything*, rather than just not being able to install large, client-facing applications. Otherwise, you can use the [Office IFilter](http://www.microsoft.com/en-us/download/details.aspx?id=20109), which is technically installing *something*, but wouldn't a third-party library also also be something? – neminem May 02 '13 at 16:00

4 Answers4

5

We can now use open source, NPOI (.NET port of Apache POI) library which also supports docx, xls & xlsx. DocX is also another open source library for creating word docs.

For DOCX I'd suggest Open XML API, though Microsoft developed Open XML to create office files through the XML files communicating with this API, the latest version 2.5 was released in 2013 which is 5 years ago.

stop-cran
  • 4,229
  • 2
  • 30
  • 47
Pavel Kudinov
  • 405
  • 2
  • 8
  • According to the NPOI link you provided it says "Support xls, xlsx, docx." - there no mention whatsoever of DOC !!! – Yahia Jan 22 '13 at 09:51
  • @Yahia Hm... Commercial library would no be such good solution. If you know a public one, that would be great :) As for NPOI, yes I've also seen that it is not supporting DOC files :( – user1999722 Jan 22 '13 at 09:56
  • Good news, it now supports both 2003 and 2007 files: "POI is an open source project which can help you read/write Office 2003/2007 files". And yes, doc is not stable (http://npoi.codeplex.com/discussions/360441)... – Pavel Kudinov Jan 22 '13 at 10:35
2

you can use Code7248.word_reader.dll

below is the sample code on how to use Code7248.word_reader.dll

add reference to this DLL in your project and copy below code.

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;


namespace testWordRead
{
    class Program
    {
        private void readFileContent(string path)
        {
            TextExtractor extractor = new TextExtractor(path);
            string text = extractor.ExtractText();
            Console.WriteLine(text);
        }
        static void Main(string[] args)
        {
            Program cs = new Program();
            string path = "D:\Test\testdoc1.docx";
            cs.readFileContent(path);
            Console.ReadLine();
        }
    }
}
Sagar Modi
  • 41
  • 3
  • 1
    do you know where can I view the license for this DLL? does its license support distribution of DLL to others? – Demona May 02 '17 at 09:52
1

Update: NPOI supports docx now. Please try the latest release (NPOI 2.0 beta)

Tony Qu
  • 676
  • 8
  • 14
-1

You can do like this:

using System.IO;
using System.Text;
using Spire.Doc;
    
namespace ReadTextLineByLine{
    class Program {
        static void Main(string[] args) {
            //Create a Document object
            Document doc = new Document();
            //Load a Word file
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\data.docx");
            //Convert the text in Word line by line into a txt file
            doc.SaveToTxt("result.text", Encoding.UTF8);
            //Read all lines of txt file
            string[] lines = File.ReadAllLines("result.text", System.Text.Encoding.Default);
        }
    }
}
Elikill58
  • 4,050
  • 24
  • 23
  • 45
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 07 '22 at 10:38