5

The interop library is slow and needs MS Office installed. Many times you don't want to install MS Office on servers.

I'd like to use Apache POI, but I'm on .NET.

I need only to extract the text portion of the files, not creating nor "storing information" in Office files.

I need to tell you that I've got a very large document library, and I can't convert it to newer XML files.

I don't want to write a parser for the binaries files. A library like Apache POI does this for us. Unfortunately, it is only for the Java platform. Maybe I should consider writing this application in Java.

I am still not finding an open source alternative to POI in .NET, I think I'll write my own application in Java.

one.beat.consumer
  • 9,414
  • 11
  • 55
  • 98
Luca Molteni
  • 5,230
  • 5
  • 34
  • 42
  • Any luck with this one? I'm simply trying to open Office docs and parse embedded links and log them. I don't need read access, but Interop and an Office suite on the server is not an option. – one.beat.consumer Jun 29 '12 at 18:40

9 Answers9

3

For all MS Office versions:

For the new Office (2007):

For the old Office (before 2007):

Yuhong Bao
  • 3,891
  • 1
  • 19
  • 20
Ilya Kochetov
  • 17,988
  • 6
  • 44
  • 60
  • TX Text controls reads only word's files. Do you know if other exists? – Luca Molteni Sep 30 '08 at 14:22
  • @IlyaKochetov - I'm looking to simply scourge Office docs for embedded links to do some policing for our content owners. Same issue, interop on the server is a bad issue (license, security, etc.)... can you think of anything else for simply parsing them -- i don't need write functionality at all. – one.beat.consumer Jun 29 '12 at 18:39
2

As the new docx formats are inherently XML based files, you can create and manipulate them programmatically with standard XML DOM techniques, once you know the structure.

The files are basically zip archives with an alternate file extension. Use the System.IO.Packaging namespace to get access to the internal elements of the file, then open them into a XmlDocument to perform the manipulation.

There are examples available for doing this, and the Office Open XML project on SourceForge may be worth looking at for inspiration.

As for the older binary formats, these were proprietary to MS, and the only way you're likely to get at the content from within is through the Office object model (requires an Office install), or a third party file converter/parser.

Unfortunately there's nothing first party and native to the .NET platform to work with these files.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ahin4114
  • 483
  • 3
  • 5
2

Check out the Aspose components. They are designed to mimic the Interop functionality without requiring a full Office install on a server.

Jason Z
  • 13,122
  • 15
  • 50
  • 62
1

What do you need to do with those file? If you just want to stream them to the user, then the basic file streams are fine. If you want to create new files (perhaps based on a template) to send to the user that the user can open in Office, there are a variety or work-arounds.

If you're actually keeping data in Office documents for use by your web site, you're doing it wrong. Office documents, even Excel spreadsheets and access databases, are not really an appropriate choice for use with an interactive web site.

Community
  • 1
  • 1
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
1

If the document is in word 2007 format, you can use the system.io.packaging library to interact with it programatically.

RWendi

RWendi
  • 1,446
  • 5
  • 20
  • 38
1

In Java world, there is also JExcelApi. It is very clearly written, from what I was able to see, much cleaner then POI. So maybe even a port of that code to .NET is not out of the question, depending of course you have enough of time on your hands.

javashlook
  • 10,341
  • 1
  • 26
  • 33
0

OpenOffice.

You can program against it and have it do a lot for you, without spending the money on a license for the server, or have the vulnerability associated with it on your server.

0

Microsoft Excel workbooks can be read using an ODBC driver (or is it an OLE DB driver? can't remember) that makes the workbook look like a database table. But I don't know whether that driver is available without the Office Suite itself.

pdc
  • 2,314
  • 20
  • 28
0

You can use OpenOffice. It has a command-line conversion tool:

Conversion Howto

In short, you define a macro in OpenOffice and you call that macro with a command-line argument to OpenOffice. In that argument the name of the local file (the Office file) is encoded.

It's not a great sollution, but it should be workable.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
extraneon
  • 23,575
  • 2
  • 47
  • 51