54

I'm attempting to create a PDF file from an HTML file. After looking around a little I've found: wkhtmltopdf to be perfect. I need to call this .exe from the ASP.NET server. I've attempted:

    Process p = new Process();
    p.StartInfo.UseShellExecute = false;
    p.StartInfo.FileName = HttpContext.Current.Server.MapPath("wkhtmltopdf.exe");
    p.StartInfo.Arguments = "TestPDF.htm TestPDF.pdf";
    p.Start();
    p.WaitForExit();

With no success of any files being created on the server. Can anyone give me a pointer in the right direction? I put the wkhtmltopdf.exe file at the top level directory of the site. Is there anywhere else it should be held?


Edit: If anyone has better solutions to dynamically create pdf files from html, please let me know.

Ryan Gates
  • 4,501
  • 6
  • 50
  • 90
Sean
  • 2,453
  • 6
  • 41
  • 53
  • Is your application producing any exceptions as a result of this operation? Is the command line operation producing any exceptions or errors? – Nathan Taylor Aug 26 '09 at 01:48
  • No it is not producing any exceptions. I actually see the command prompt come up really fast. If I don't put the: HttpContext.Current.Server.MapPath(), I do get a file not found exception. – Sean Aug 26 '09 at 01:55
  • You may be able to use FileMon or other sysinternals tool to see what file was not found. Have you tried specifying absolute paths too? – Brian Lyttle Aug 26 '09 at 02:01
  • See http://stackoverflow.com/questions/tagged/pdf-generation. – John Saunders Sep 10 '09 at 16:29

11 Answers11

51

Update:
My answer below, creates the pdf file on the disk. I then streamed that file to the users browser as a download. Consider using something like Hath's answer below to get wkhtml2pdf to output to a stream instead and then send that directly to the user - that will bypass lots of issues with file permissions etc.

My original answer:
Make sure you've specified an output path for the PDF that is writeable by the ASP.NET process of IIS running on your server (usually NETWORK_SERVICE I think).

Mine looks like this (and it works):

/// <summary>
/// Convert Html page at a given URL to a PDF file using open-source tool wkhtml2pdf
/// </summary>
/// <param name="Url"></param>
/// <param name="outputFilename"></param>
/// <returns></returns>
public static bool HtmlToPdf(string Url, string outputFilename)
{
    // assemble destination PDF file name
    string filename = ConfigurationManager.AppSettings["ExportFilePath"] + "\\" + outputFilename + ".pdf";

    // get proj no for header
    Project project = new Project(int.Parse(outputFilename));

    var p = new System.Diagnostics.Process();
    p.StartInfo.FileName = ConfigurationManager.AppSettings["HtmlToPdfExePath"];

    string switches = "--print-media-type ";
    switches += "--margin-top 4mm --margin-bottom 4mm --margin-right 0mm --margin-left 0mm ";
    switches += "--page-size A4 ";
    switches += "--no-background ";
    switches += "--redirect-delay 100";

    p.StartInfo.Arguments = switches + " " + Url + " " + filename;

    p.StartInfo.UseShellExecute = false; // needs to be false in order to redirect output
    p.StartInfo.RedirectStandardOutput = true;
    p.StartInfo.RedirectStandardError = true;
    p.StartInfo.RedirectStandardInput = true; // redirect all 3, as it should be all 3 or none
    p.StartInfo.WorkingDirectory = StripFilenameFromFullPath(p.StartInfo.FileName);

    p.Start();

    // read the output here...
    string output = p.StandardOutput.ReadToEnd(); 

    // ...then wait n milliseconds for exit (as after exit, it can't read the output)
    p.WaitForExit(60000); 

    // read the exit code, close process
    int returnCode = p.ExitCode;
    p.Close(); 

    // if 0 or 2, it worked (not sure about other values, I want a better way to confirm this)
    return (returnCode == 0 || returnCode == 2);
}
MGOwen
  • 6,562
  • 13
  • 58
  • 67
  • +1 Thanks for the code. This is working perfectly for me as well. Have you ever found out more info about the return codes? – Jeremy Apr 24 '10 at 00:27
  • No, I couldn't find any info on them. Try the wkhtmltopdf area on Google code. (If this is the answer you used, you can accept it as the answer for other people with the same problem who stumble across this question later) – MGOwen Apr 27 '10 at 01:44
  • 3
    'return (returnCode <= 2)' should be 'return (returnCode == 0 || returnCode == 2)' because you'll receive '1' if the output file already exists, so check before executing process. – bob May 29 '10 at 09:59
  • I do not see the way how this code could work on IIS. You will get access denied because default IIS user account will not allow to execute exe file. – Tomas Apr 14 '11 at 12:05
  • +1 Very helpful. Thanks a bunch for posting this code. Not entirely sure you need the WaitForExit() call. Start never returns for me immediately... are you waiting on the output to read out? – JasonCoder Apr 17 '12 at 17:05
  • Im having an error when i try to execute the DownloadPDF action in my project two times or more.. the first time works, but the next ones throws a "File is open by another process" exception", can someone else fix this? – Phoenix_uy Sep 12 '12 at 23:34
  • I have received "1" as return code but the directory was empty, so I guess @bob is not completely right. – marquito Sep 06 '13 at 14:20
  • in the debug mode i am getting too much of this lines, when it comes in p.StandardOutput.ReadToEnd(): The thread '' (0x1e40) has exited with code 0 (0x0)... and its continuing... its endless – Pnctovski Sep 23 '13 at 11:38
41

I had the same problem when i tried using msmq with a windows service but it was very slow for some reason. (the process part).

This is what finally worked:

private void DoDownload()
{
    var url = Request.Url.GetLeftPart(UriPartial.Authority) + "/CPCDownload.aspx?IsPDF=False?UserID=" + this.CurrentUser.UserID.ToString();
    var file = WKHtmlToPdf(url);
    if (file != null)
    {
        Response.ContentType = "Application/pdf";
        Response.BinaryWrite(file);
        Response.End();
    }
}

public byte[] WKHtmlToPdf(string url)
{
    var fileName = " - ";
    var wkhtmlDir = "C:\\Program Files\\wkhtmltopdf\\";
    var wkhtml = "C:\\Program Files\\wkhtmltopdf\\wkhtmltopdf.exe";
    var p = new Process();

    p.StartInfo.CreateNoWindow = true;
    p.StartInfo.RedirectStandardOutput = true;
    p.StartInfo.RedirectStandardError = true;
    p.StartInfo.RedirectStandardInput = true;
    p.StartInfo.UseShellExecute = false;
    p.StartInfo.FileName = wkhtml;
    p.StartInfo.WorkingDirectory = wkhtmlDir;

    string switches = "";
    switches += "--print-media-type ";
    switches += "--margin-top 10mm --margin-bottom 10mm --margin-right 10mm --margin-left 10mm ";
    switches += "--page-size Letter ";
    p.StartInfo.Arguments = switches + " " + url + " " + fileName;
    p.Start();

    //read output
    byte[] buffer = new byte[32768];
    byte[] file;
    using(var ms = new MemoryStream())
    {
        while(true)
        {
            int read =  p.StandardOutput.BaseStream.Read(buffer, 0,buffer.Length);

            if(read <=0)
            {
                break;
            }
            ms.Write(buffer, 0, read);
        }
        file = ms.ToArray();
    }

    // wait or exit
    p.WaitForExit(60000);

    // read the exit code, close process
    int returnCode = p.ExitCode;
    p.Close();

    return returnCode == 0 ? file : null;
}

Thanks Graham Ambrose and everyone else.

Hath
  • 12,606
  • 7
  • 36
  • 38
  • im trying to test ur solution, it would be great help for me if it works..but i want to convert my .aspx to a pdf not an url, is it possible the same way? so i changed ur var with this : var url = HttpContext.Current.Server.MapPath("~/wkhtmltopdf/chartImage.aspx"); but it didnt work – Armance Dec 01 '11 at 17:29
  • 2
    @astrocybernaute aspx needs a server to produce html from it so you need to call it using a server and not directly :) – Joel Peltonen Dec 19 '12 at 07:43
20

OK, so this is an old question, but an excellent one. And since I did not find a good answer, I made my own :) Also, I've posted this super simple project to GitHub.

Here is some sample code:

var pdfData = HtmlToXConverter.ConvertToPdf("<h1>SOO COOL!</h1>");

Here are some key points:

  • No P/Invoke
  • No creating of a new process
  • No file-system (all in RAM)
  • Native .NET DLL with intellisense, etc.
  • Ability to generate a PDF or PNG (HtmlToXConverter.ConvertToPng)
Timothy Khouri
  • 31,315
  • 21
  • 88
  • 128
  • 2
    I am not sure why everyone is not triple staring your solution, it is what everyone is looking for. Take the original c app and convert it to run in memory and return a byte array. EXCELLENT work! – Dave May 06 '15 at 16:24
  • 1
    Nuget package always fails to install and compiled dll always give error of missing assembly or reference – SMUsamaShah Feb 22 '16 at 17:37
  • @LifeH2O which nuget package? I can't see one for this project. – Ergwun Apr 12 '16 at 06:08
  • @LifeH2O Thanks. I get same failure to install :( – Ergwun Apr 12 '16 at 06:28
  • @LifeH2O Nuget doesn't work. I get the following error when trying to install: http://pastebin.com/9RVxTeB3 – slayernoah Nov 08 '16 at 20:34
7

Check out the C# wrapper library (using P/Invoke) for the wkhtmltopdf library: https://github.com/pruiz/WkHtmlToXSharp

Jason S
  • 1,129
  • 11
  • 10
5

There are many reason why this is generally a bad idea. How are you going to control the executables that get spawned off but end up living on in memory if there is a crash? What about denial-of-service attacks, or if something malicious gets into TestPDF.htm?

My understanding is that the ASP.NET user account will not have the rights to logon locally. It also needs to have the correct file permissions to access the executable and to write to the file system. You need to edit the local security policy and let the ASP.NET user account (maybe ASPNET) logon locally (it may be in the deny list by default). Then you need to edit the permissions on the NTFS filesystem for the other files. If you are in a shared hosting environment it may be impossible to apply the configuration you need.

The best way to use an external executable like this is to queue jobs from the ASP.NET code and have some sort of service monitor the queue. If you do this you will protect yourself from all sorts of bad things happening. The maintenance issues with changing the user account are not worth the effort in my opinion, and whilst setting up a service or scheduled job is a pain, its just a better design. The ASP.NET page should poll a result queue for the output and you can present the user with a wait page. This is acceptable in most cases.

Brian Lyttle
  • 14,558
  • 15
  • 68
  • 104
  • Hi, understood. Can you suggest a better way? – Sean Aug 26 '09 at 01:52
  • 1
    MSMQ + Windows Services is the general approach. – Noon Silk Aug 26 '09 at 01:58
  • To follow up on that, either search around, or I've described it briefly here: http://stackoverflow.com/questions/1317641/queue-based-background-processing-in-asp-net-mvc-web-application – Noon Silk Aug 26 '09 at 02:02
  • MSMQ + Windows Services is a specific approach. You can often implement something with SQL Server if you don't know how to use MSMQ or don't want to take a dependency on it. The general thing to look for is queuing systems, of which MSMQ is just one. – Brian Lyttle Aug 26 '09 at 02:13
  • You probably shouldn't give the ASP.NET user account any extra rights, it could be a security issue. If possible you should impersonate for just this action, creating a special account with very limited permissions. – Yuriy Faktorovich Aug 26 '09 at 03:39
5

You can tell wkhtmltopdf to send it's output to sout by specifying "-" as the output file. You can then read the output from the process into the response stream and avoid the permissions issues with writing to the file system.

Graham Ambrose
  • 1,891
  • 2
  • 13
  • 17
3

My take on this with 2018 stuff.

I am using async. I am streaming to and from wkhtmltopdf. I created a new StreamWriter because wkhtmltopdf is expecting utf-8 by default but it is set to something else when the process starts.

I didn't include a lot of arguments since those varies from user to user. You can add what you need using additionalArgs.

I removed p.WaitForExit(...) since I wasn't handling if it fails and it would hang anyway on await tStandardOutput. If timeout is needed, then you would have to call Wait(...) on the different tasks with a cancellationtoken or timeout and handle accordingly.

public async Task<byte[]> GeneratePdf(string html, string additionalArgs)
{
    ProcessStartInfo psi = new ProcessStartInfo
    {
        FileName = @"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe",
        UseShellExecute = false,
        CreateNoWindow = true,
        RedirectStandardInput = true,
        RedirectStandardOutput = true,
        RedirectStandardError = true,
        Arguments = "-q -n " + additionalArgs + " - -";
    };

    using (var p = Process.Start(psi))
    using (var pdfSream = new MemoryStream())
    using (var utf8Writer = new StreamWriter(p.StandardInput.BaseStream, 
                                             Encoding.UTF8))
    {
        await utf8Writer.WriteAsync(html);
        utf8Writer.Close();
        var tStdOut = p.StandardOutput.BaseStream.CopyToAsync(pdfSream);
        var tStdError = p.StandardError.ReadToEndAsync();

        await tStandardOutput;
        string errors = await tStandardError;

        if (!string.IsNullOrEmpty(errors)) { /* deal/log with errors */ }

        return pdfSream.ToArray();
    }
}

Things I haven't included in there but could be useful if you have images, css or other stuff that wkhtmltopdf will have to load when rendering the html page:

  • you can pass the authentication cookie using --cookie
  • in the header of the html page, you can set the base tag with href pointing to the server and wkhtmltopdf will use that if need be
Yepeekai
  • 2,545
  • 29
  • 22
2

Thanks for the question / answer / all the comments above. I came upon this when I was writing my own C# wrapper for WKHTMLtoPDF and it answered a couple of the problems I had. I ended up writing about this in a blog post - which also contains my wrapper (you'll no doubt see the "inspiration" from the entries above seeping into my code...)

Making PDFs from HTML in C# using WKHTMLtoPDF

Thanks again guys!

fujiFX
  • 413
  • 9
  • 18
John Reilly
  • 5,791
  • 5
  • 38
  • 63
0

The ASP .Net process probably doesn't have write access to the directory.

Try telling it to write to %TEMP%, and see if it works.

Also, make your ASP .Net page echo the process's stdout and stderr, and check for error messages.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Not sure, wasn't me. Thanks for the info though, will test it out. Seems I should go about a different way to create pdf files from html though. – Sean Aug 26 '09 at 01:53
  • there are .NET wrappers for it, http://csharp-source.net/open-source/pdf-libraries came from a quick google search – Yuriy Faktorovich Aug 26 '09 at 03:41
0

Generally return code =0 is coming if the pdf file is created properly and correctly.If it's not created then the value is in -ve range.

Sukanya
  • 1,041
  • 8
  • 21
  • 40
-1
using System;
using System.Diagnostics;
using System.Web;

public partial class pdftest : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {

    }
    private void fn_test()
    {
        try
        {
            string url = HttpContext.Current.Request.Url.AbsoluteUri;
            Response.Write(url);
            ProcessStartInfo startInfo = new ProcessStartInfo();
            startInfo.FileName = 
                @"C:\PROGRA~1\WKHTML~1\wkhtmltopdf.exe";//"wkhtmltopdf.exe";
            startInfo.Arguments = url + @" C:\test"
                 + Guid.NewGuid().ToString() + ".pdf";
            Process.Start(startInfo);
        }
        catch (Exception ex)
        {
            string xx = ex.Message.ToString();
            Response.Write("<br>" + xx);
        }
    }
    protected void btn_test_Click(object sender, EventArgs e)
    {
        fn_test();
    }
}
Chris Marisic
  • 32,487
  • 24
  • 164
  • 258
Peter
  • 1