5

I convert PDF file to the word file using PDFFocus.net dll. But for my system I want .docx file. I tried different ways. There some libraries available. But those are not free. This is my pdf to doc convert code.

    Using System;
    Using System.Collections.Generic;
    Using System.Linq;
    Using System.Text;
    Using System.Threading.Tasks;
    Using iTextSharp.text;
    Using iTextSharp.text.pdf;

    namespace ConsoleApplication
    {
          class Program
          {
               static void main(String[] args)
               {
                    SautinSoft.PdfFocus f=new SautinSoft.PdfFocus();
                    f.OpenPdf(@"E:\input.pdf");

                         t.ToWord(@"E:\input.doc");
                }
          }
    }

This work successfully. Then I tried with below code to convert .doc to .docx. But it gives me error.

//Open a Document.
Document doc=new Document("input.doc");
//Save Document.
doc.save("output.docx");

Can anyone help me please.

Lasa
  • 75
  • 1
  • 1
  • 6
  • 1
    A library like [Gembox.Document](http://www.gemboxsoftware.com/document/overview) or [SpireDoc.NET Free](http://www.e-iceblue.com/Introduce/free-doc-component.html) might help - load the `.doc` and save as `.docx` – marc_s Dec 05 '15 at 21:43
  • Based on a quick look at the [documentation](http://www.sautinsoft.net/help/pdf-to-word-tiff-images-text-rtf-csharp-vb-net/index.aspx) there is no sign that PDFFocus supports anything other than RTF output (even if using a `.doc` file extension). Are you sure it can generate the Open XML based Word format (`.docx`)? – Richard Dec 06 '15 at 12:18

2 Answers2

9

Yes like Erop said. You can use the Microsoft Word 14.0 Object Library. Then it's really easy to convert from doc to docx. E.g with a function like this:

    public void ConvertDocToDocx(string path)
    {
        Application word = new Application();

        if (path.ToLower().EndsWith(".doc"))
        {
            var sourceFile = new FileInfo(path);
            var document = word.Documents.Open(sourceFile.FullName);

            string newFileName = sourceFile.FullName.Replace(".doc", ".docx");
            document.SaveAs2(newFileName,WdSaveFormat.wdFormatXMLDocument, 
                             CompatibilityMode: WdCompatibilityMode.wdWord2010);

            word.ActiveDocument.Close();
            word.Quit();

            File.Delete(path);
        }
    }

Make sure to add CompatibilityMode: WdCompatibilityMode.wdWord2010 otherwise the file will stay in compatibility mode. And also make sure that Microsoft Office is installed on the machine where you want to run the application.

Another thing, I don't know PDFFocus.net but have you tried converting directly from pdf to docx. Like this:

     static void main(String[] args)
     {
           SautinSoft.PdfFocus f=new SautinSoft.PdfFocus();
           f.OpenPdf(@"E:\input.pdf");

                t.ToWord(@"E:\input.docx");
     }

I would assume that this is working, but it's only an assumption.

Dave
  • 473
  • 3
  • 8
  • Thank You very much Dave. It's work for me. I tried PDFFocus.net with .docx. But PDFFocus.net only support for .Doc files. However thank you very much for your answer.. – Lasa Dec 06 '15 at 02:40
  • Note that there are 2 Word InterOp assemblies. I tested the 1st codeblock successfully with v15.0 in a Console App. The document opened in Compatibility Mode even with ensuring I had that `CompatibilityMode` line, but I don't really think it matters. Caveats-it shouldn't be tried from "server" code-which includes from a web site or Windows Service locally-because it's ran in the context of a diff user than the logged-in one-gives: `CO_E_SERVER_EXEC_FAILURE (0x80080005): Server execution failed` https://support.microsoft.com/en-us/help/257757/considerations-for-server-side-automation-of-office – vapcguy Nov 07 '18 at 22:39
  • Btw, if you do try this and get that error, I found this to get past the error: https://stackoverflow.com/questions/3477086/accessing-office-word-object-model-through-asp-net-results-in-failed-due-to-the – vapcguy Nov 07 '18 at 22:46
1

Try to use Microsoft.Office.Interop.Word assembly.

An MSDN article can be found Here

Include references in your project, and enable their use in a code module via an example from the above link that shows

using System.Collections.Generic;
using Word = Microsoft.Office.Interop.Word;
Drew
  • 24,851
  • 10
  • 43
  • 78
Ghost
  • 33
  • 4
  • 1
    Thanks for posting an answer to this question! This answer is very short though and doesn't provide much context. Please explain some of the reasoning behind it, and it will become much more useful for the asker and future readers. Thanks! – Maximillian Laumeister Dec 05 '15 at 23:08
  • I tweaked it a bit. Welcome to the Stack. – Drew Dec 05 '15 at 23:23