0

I am trying to convert a docx file to a pdf. I am using code from stackoverflow, but modified to allow for the dynamic selection of a file to open (rather than a hard-coded value). When I run it, I get an exception on the Open() method - could not find file. I select the file using a fileupload control so I know the file is there. What's going on?

Here is my code:

using System;
using System.IO;

using Microsoft.Office.Interop.Word;
using OpenXmlPowerTools;

namespace DocxToPdf
{
    public partial class WebForm1 : System.Web.UI.Page
    {

        public Microsoft.Office.Interop.Word.Document wordDoc;

        protected void Page_Load(object sender, EventArgs e)
        {

        }

        protected void UploadButton_Click(object sender, EventArgs e)
        {
            if (DocxFileUpload.HasFile)
            {
                string docxFile = DocxFileUpload.PostedFile.FileName;
                FileInfo fiFile = new FileInfo(docxFile);
                if (Util.IsWordprocessingML(fiFile.Extension))
                {
                    Guid pdfFileGuid = Guid.NewGuid();
                    string pdfFileLoc = string.Format(@"c:\windows\temp\{0}.pdf", pdfFileGuid.ToString());

                    Microsoft.Office.Interop.Word.Application appWord = new Microsoft.Office.Interop.Word.Application();
                    wordDoc = appWord.Documents.Open(docxFile);
                    wordDoc.ExportAsFixedFormat(pdfFileLoc, WdExportFormat.wdExportFormatPDF);
                    MsgLabel.Text = "File converted to PDF";

                }
                else
                {
                    MsgLabel.Text = "Not a WordProcessingML document.";
                }
            }
            else
            {
                MsgLabel.Text = "You have not specified a file.";

            }
        }
    }
}

The error occurs on the "wordDoc = appWord.Documents.Open(docxFile);" line.

The fileupload control FileName property has just the file name, not the fully qualified path. I understand why I'm getting a "file not found" error - it's because the file doesn't have the fully qualified path in it. My question to the group is, how do I get the fully qualified path and file name, so I can open it? I've run a debug session and examined all the properties of the fileupload control and the FileInfo control, but they don't have it. The "FullPath" property of the FileInfo control is set to "c:\Program Files (x86)\IIS Express\myfile.docx", but that's not where the file is located.

Here's some more information about the error: Exception System.Runtime.InteropServices.COMException in DocxToPdf.dll (Sorry, we couldn't find your file. Is it possible it was moved, renamed or deleted? C:\Windows...\myfile.docx...

I've googled around on this, but so far no luck. Please help! Thanks.

Pete Hesse
  • 45
  • 1
  • 9
  • problem lies in how `DocxFileUpload.PostedFile.FileName` is getting set. without that code, can't really help – bnem Jun 19 '17 at 19:51

1 Answers1

1

First off, you should be aware that with web applications there are two machines at work-- the client (where the browser runs) and the server (where your app lives). Each has its own file system. The server cannot access the client's file system and vice versa-- this is for obvious security reasons. Now maybe it works on a development machine because you are running the site locally, but it would never work in a production environment.

So Microsoft Word cannot open a file that is located on the client machine. Period. The client can upload a file, and the FileUpload control will let you access the bytestream-- but it doesn't automatically save the file locally. You can't access the path, either, because the path is on the client's filesystem and the names of his folders are private information.

To get this scheme to work at all, you need to first save the uploaded file somewhere locally using FileUpload.SaveAs. Then you should use that saved file to open it up in Word. Something like this:

var filePath = Path.GetTempFileName();
DocxFileUpload.SaveAs(filePath);
var appWord = new Microsoft.Office.Interop.Word.Application();
var wordDoc = appWord.Documents.Open(filePath);
var convertedFilePath = Path.GetTempFileName();
wordDoc.ExportAsFixedFormat(convertedFilePath, WdExportFormat.wdExportFormatPDF);

You will then need to provide some means of getting the converted file back to the browser, by writing it to the HTTP response. Example:

Response.Clear();
Response.AddHeader("content-disposition", "attachment; filename=Converted.Pdf");
Response.AddHeader("content-type", "application/pdf");   
Response.TransmitFile(convertedFilePath);

Don't forget to clean up your files afterward, or you will run out of disk space as more and more users use your application:

}
finally
{
    File.Delete(filePath);
    File.Delete(convertedFilePath);
}

I put the delete commands in a finally block so that they run even if something goes wrong, e.g. the request times out. You need those files to get cleaned up no matter what. You might also want to schedule a system task to clean up the folder on a nightly basis, just in case one of the files is locked due to Word being hung, that sort of thing.

Also, make sure your application's AppPool can read and write to the temp folder.

If you want to use a separate handler for downloading

If you want to show other content alongside the PDF, you'd have to use a separate handler for downloading. Here's a rough outline:

There are three URLs used in this solution:

  • Upload.aspx The page that allows the user to specify a file for uploading
  • Confirm.asp The page that is displayed in response, which includes a large iFrame
  • File.ashx The handler that returns PDF that is displayed in the iFrame

You've already coded Upload.aspx.

Confirm.aspx needs code to accept the upload, save locally, open Word, and convert the file. The path of the converted file needs to be converted to a token of some kind. The page then needs to return a page that contains an iFrame pointed at File.ashx?docID=token.

File.ashx needs to set the response headers, use the token to recreate the path of the PDF file, and return the file over the HttpResponse.

At some point you will need to figure out how to clean up the temp folder, perhaps with a job that runs regularly and deletes any .doc or .pdf file older than 10 minutes, that sort of thing.

John Wu
  • 50,556
  • 8
  • 44
  • 80
  • One line of code you provided is puzzling to me: `var wordDoc = appWord.Documents.Open(docxFile);`. Shouldn't the value in the parentheses be "filePath"? That variable contains the fully qualified path and file name of the saved file. "DocxFile" is simply the file name. How would the Open() method find the file? – Pete Hesse Jun 20 '17 at 16:16
  • Is it necessary to create an HTTP Handler, as is done in the link you provided? Or can the above code be put directly into my current asp page? My next steps are to display the pdf file on the screen, with a 'save' button. Clicking that will save it to a database. Once the file is in the Response object, will it stay there or will it get overwritten? – Pete Hesse Jun 20 '17 at 19:41
  • If you want to display the PDF along with a button of your own, you will pretty much have to show that PDF in an iFrame, so yes, you'd need a handler, since your web page would be serving the content that contains the iFrame. – John Wu Jun 20 '17 at 19:59
  • How do I integrate the http handler into my existing code? The UploadButton_Click() method has to run to get the file, before the http handler can display it in the iframe. Do I "invoke" the http handler from within the above code, or is there some other method for doing this? – Pete Hesse Jun 22 '17 at 16:03
  • _The path of the converted file needs to be converted to a token of some kind._ Would you please elaborate? I don't understand. (btw, Thank you for all your help. Much appreciated). – Pete Hesse Jun 22 '17 at 19:26
  • You need some way to pass enough information in the URL for the handler to figure out the path, but it isn't a great idea to pass the path itself, for security reasons. So you could encrypt the path, or store it server side in a database table and use the primary key from that table. – John Wu Jun 22 '17 at 19:36
  • If you need something simple, you can use a GUID as the file name and then pass the GUID in the URL. The handler would have to reconstruct the path, but that should be trivial. – John Wu Jun 22 '17 at 19:41
  • Ok, I'm going to encrypt the path. But how do I get the encrypted string to the handler? I guess I'm not clear on how the handler gets invoked. – Pete Hesse Jun 23 '17 at 16:23
  • You can pass it in the URL that you set on the iFrame, e.g. `File.ashx?docID=token`. – John Wu Jun 23 '17 at 17:58
  • In the handler code, how do I retrieve the encrypted string? There is no Request object so I can't use `QueryString`. Do I use `context.Request.Querystring`? – Pete Hesse Jun 23 '17 at 19:11
  • Yes, use `context.Request.Querystring["docID"]` – John Wu Jun 23 '17 at 19:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/147488/discussion-between-pete-hesse-and-john-wu). – Pete Hesse Jun 23 '17 at 19:33