0

I need to convert .doc and .docx document format to pdf in the server side using .net core. I've searched for it, and it came to this question that has remarkable answered for .docx to pdf issue. It said that you have to convert it first to HTML format using OpenXMLPowerTools, and from HTML to pdf. And you may see in the answer, that there's a solution for the conversion from .doc to .docx, and that using b2xtranslator, a library to convert Microsoft Office binary files to Open XML format files. What I am missing here is the usage of this library. I can't find any sample how to use it to convert the .doc file, but only this comment on this question.

Based on that, I tried to use the library, but I met a dead end. This is my code:

//check file extension
FileInfo file = new FileInfo(textBox1.Text);

if (file.Extension == ".doc")
{
        FileStream streamDocFile = new FileStream(file.FullName, FileMode.Open);

    var fileDoc = new b2xtranslator.DocFileFormat.WordDocument(new b2xtranslator.StructuredStorage.Reader.StructuredStorageReader(streamDocFile));

    var fileDocx = b2xtranslator.OpenXmlLib.WordprocessingML.WordprocessingDocument.Create(file.Name + "x", b2xtranslator.OpenXmlLib.OpenXmlPackage.DocumentType.Document);

    b2xtranslator.WordprocessingMLMapping.Converter.Convert(fileDoc, fileDocx);
}

My questions are:

  1. How to write the .docx file? I don't know if the code is right or not, because I am confused about how to write it (fileDocx object) to file and to check it.
  2. How to pass .docx resulting in b2xtranslator, to Open-XML-PowerTools, so I can convert it into HTML format?

Thank you in advance.

Mohsen Bg
  • 143
  • 2
  • 6
abuybuy
  • 799
  • 2
  • 16
  • 33
  • See if the following code in the bxtranslator repository helps: https://github.com/EvolutionJobs/b2xtranslator/blob/master/Shell/doc2x/Program.cs – jonsson Aug 26 '22 at 15:46
  • I have no idea, the process of converting can be seen from here. Gonna try it. thank you @jonsson – abuybuy Aug 28 '22 at 00:09

2 Answers2

0

TL;DR see the OP answer that this route was simply too problematic, and paid is usually cheaper (lower Total Cost of Ownership, Faster To Market, and someone else to provide support with the tricky bits.).

Practically all means to convert MS Word processor formats (RTF WPS DOC DOCx) to PDF should be direct such as Adobe Word Plug-in or MS export/save as PDF etc.

If you need to use B2X.net interop(erability) see code using Microsoft.Office.Interop.Word; there are still legacy dependencies to be considered. https://learn.microsoft.com/en-us/archive/blogs/interoperability/binary-to-open-xml-b2x-translator-interoperability-for-the-office-binary-file-formats potentially an older full suite of .net and older MS Office, (my attempts to install older.net failed on win 11).

The second best alternative is use Current Open Office, which should direct convert Doc to PDF exports. Here is the B2X demo.doc showing as default open in LO Writer 7.4 and default export as PDF.

enter image description here

Libre-Office has good command line conversion and "Basic App"/DDE support so you can control any adjustments needed. For current command line in out filters and thus the different types of MS.doc version support see https://help.libreoffice.org/7.4/en-US/text/shared/guide/convertfilters.html?&DbPAR=SHARED&System=WIN

Microsoft WinWord 1/2/5 "MS WinWord 5"
application/msword (doc)

Microsoft Word 6.0 "MS WinWord 6.0"
application/msword (doc)

Microsoft Word 95 "MS Word 95"
application/msword (doc)

Microsoft Word 95 Template "MS Word 95 Vorlage"
application/msword (dot)

Word 97–2003 "MS Word 97"
application/msword (doc wps)

Word 97–2003 Template "MS Word 97 Vorlage"
application/msword (dot wpt)
K J
  • 8,045
  • 3
  • 14
  • 36
0

Finally, the decision was using 3rd party library for document processing. Because we need library that stable for document processing, and we have short time to finish the project, our company decided to buy 3rd party library.

This answer is not very helpful for those who looking for a free way to process the doc. May you have found a better one.

Thank you

abuybuy
  • 799
  • 2
  • 16
  • 33
  • 1
    @KJ I am not state which paid 3rd party library that we use, on purpose. Because i am not paid to advertise it. What i can say is, we tried several paid 3rd party, using their trial version. Aspose, Spire, Telerik, Syncfunction, Gembox. Tried to run it on our docker environment, and analyze the resource usage and performance. What i choose may be or may not be fit to yours. – abuybuy Feb 01 '23 at 03:24
  • This answer does not actually answer the question. It would be better if this were submitted as a comment on the original question rather than as an answer to it. – Cosmo Mar 13 '23 at 21:12