1

Have read a few variants of this, but nothing exactly addresses the problem I have.

I'm part of a data migration team, and one of our tasks is migrating existing documents from one environment to another, and once migrated, maintain any existing hyperlinks within.

For relatively new documents (Office 2007+), this is no problem. I've had a look at DocX by Cathal Coffey; NPOI which apparently is unstable and not recommended for use - or at least the part I need anyway; GemBox and others, and while they work perfectly for newer documents, none of them can deal with opening/modifying documents from Word '97. Thankfully documents created under Win 3.1 or Word for Windows 2 are out of scope.

I realise that these documents are very old and not supported any more and as such, may pose security risks, I also realise that they should have been maintained and brought up to date by their respective teams but for whatever reason, they haven't been and now it's my job to try and come up with a way to do this.

Using the oldest version of the COM object I have available (Microsoft Office 14.0 Object Library, Version 8.5.0.0) I run into problems with making changes to Trust Settings, Registry changes etc. Doing all of this leads to its own problems such as having to open the document in protected mode when I need to make changes to it, and besides, when this gets deployed, I won't have access to the Trust Center nor to the Registry. Examining the document in memory shows the Hyperlink collection but won't let me see the details like I can in DocX for example.

Is there way to do this or am I going back saying these docs are too old, unsupported and the relevant teams need to do a better job of maintaining their documents? Thought about maybe trying to read the doc in as HTML then examining any href tags, thoughts? Can I get my hands on older versions of the Microsft DLLs, and even if I can, will they be compatible with VS 2015? 3rd party libraries are an option (Gem, DocX etc.) but something like Aspose Documents is out of the question as the license is $1000.

Nice to have - something that will work without needing Office installed would be truly the stuff of dreams.

Thanks everyone.

doop_dev
  • 29
  • 7
  • Is converting the binary format (.doc) to Open XML (.docx) an option? Then have a look here: http://stackoverflow.com/a/2405508/40347 – Dirk Vollmar Sep 14 '16 at 13:31
  • @DirkVollmar At this point, anything is an option. Will look into that, thank you for replying. – doop_dev Sep 15 '16 at 08:52

1 Answers1

0

The simplest and fastest way would be to convert the documents to Open XML format. This can be easily done on the command line (replace the path with the path where winword.exe is installed on your machine):

"C:\Program Files\Microsoft Office\Office15\wordconv.exe" -oice -nme <input file> <output file>

where and need to be fully qualified path names.

The command can be easily applied to multiple documents using for:

for %F in (*.doc) do "C:\Program Files\Microsoft Office\Office12\wordconv.exe" -oice -nme "%F" "%Fx"

Once the files are converted you can modify the documents by editing the raw XML inside the zip package or by using Microsoft's Open XML SDK.

Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316