65

I'd like to know if ITextSharp has the capability of converting HTML to PDF. Everything I will convert will just be plain text but unfortunately there is very little to no documentation on ITextSharp so I can't determine if that will be a viable solution for me.

If it can't do it, can someone point me to some good, free .net libraries that can take a simple plain text HTML document and convert it to a pdf?

tia.

Simon Martin
  • 4,203
  • 7
  • 56
  • 93
Kyle
  • 10,839
  • 17
  • 53
  • 63

9 Answers9

65

I came across the same question a few weeks ago and this is the result from what I found. This method does a quick dump of HTML to a PDF. The document will most likely need some format tweaking.

private MemoryStream createPDF(string html)
{
    MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);

    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();
    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();

    return msOutput;
}
meJustAndrew
  • 6,011
  • 8
  • 50
  • 76
Jonathan
  • 1,719
  • 1
  • 15
  • 19
  • 10
    To save someone else from having to dig through documentation, note that as of 5.1.1, HTMLWorker can be found in iTextSharp.text.html.simpleparser. – James Skemp Dec 02 '11 at 11:59
  • 76
    Why do people never use "using" statements in c# code examples? – cbp Feb 06 '12 at 00:40
  • 5
    @cbp I typically call a method like this in a using statement declaration. ex. `using(MemoryStream stream = createPDF(html)){}` – Jonathan Feb 06 '12 at 14:44
  • 6
    To save even more people, HTMLWorker is now obsolete. Use XMLWorkerHelper.ParseXHtml(), but beware, it only works with XHTML. You have to download it separately as it is designed as an add-on. – Jake Mar 26 '12 at 02:58
  • Jake, I tried the XMLWorkerHelper and it keeps producing empty PDFs. Ran my HTML through an XML parser and it is valid. Any thoughts? – Sam Sep 24 '12 at 23:34
  • TextReader reader = new StringReader(html); throws Object reference not set to an instance of an object. – navule Jan 09 '13 at 13:49
  • hi @Jonathan, can you please help me:http://stackoverflow.com/questions/20950236/how-to-insert-html-markup-using-itextsharp-for-creating-pdf-using-c – SHEKHAR SHETE Jan 06 '14 at 12:51
  • 6
    The `HTMLWorker` class is now obsolete and has been replaced by `XMLWorker`. See https://stackoverflow.com/questions/25164257/how-to-convert-html-to-pdf-using-itextsharp for an in-depth overview on how to use it to render HTML to a PDF, including CSS rendering – mark.monteiro Oct 02 '15 at 20:25
  • Forgive me, but, when you say that this creates a PDF -- where is the PDF saved? – Hugh Seagraves Mar 29 '16 at 21:51
  • @HughSeagraves The method in answer returns a PDF as MemoryStream, he called it msOutput. This can for example be shown in a browser, or in your case saved locally. The syntax for saving would be something like `File.WriteAllBytes("C:/result.pdf", msOutput.ToArray());`, but don't forget to dispose the MemoryStream afterwards... – T_D Apr 01 '16 at 14:24
  • `HTMLWorker` doesn't support string interpolation. – Sirwan Afifi Oct 05 '17 at 15:45
28

after doing some digging I found a good way to accomplish what I need with ITextSharp.

Here is some sample code if it will help anyone else in the future:

protected void Page_Load(object sender, EventArgs e)
{
    Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
}
Fredrik Hedblad
  • 83,499
  • 23
  • 264
  • 266
Kyle
  • 10,839
  • 17
  • 53
  • 63
  • 4
    You probably don't want to write your output to a fixed path like you're doing with a web app. You'll get resource contention against that single file under load. Use a MemoryStream or a temp file yielded up by the OS (be sure to delete the temp file when you're done with it). How to create a temp file: http://msdn.microsoft.com/en-us/library/system.io.path.gettempfilename.aspx – ntcolonel Feb 23 '12 at 17:07
  • 2
    Object reference not set to an instance of an object. at List htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null); – navule Jan 09 '13 at 13:50
  • hi @Kyle will u please help me:http://stackoverflow.com/questions/20950236/how-to-insert-html-markup-using-itextsharp-for-creating-pdf-using-c – SHEKHAR SHETE Jan 06 '14 at 12:52
  • Hey, when i give localhost link, i get the error "The remote server returned an error: (401) Unauthorized." Did you face this? – Ensar Turkoglu Mar 28 '16 at 10:15
  • You need to enable permissions on the folder that you are trying to write the new file to. Or if the file already exists, it may be in use by another process. – gunwin Jul 26 '16 at 18:57
11

Here's what I was able to get working on version 5.4.2 (from the nuget install) to return a pdf response from an asp.net mvc controller. It could be modfied to use a FileStream instead of MemoryStream for the output if that's what is needed.

I post it here because it is a complete example of current iTextSharp usage for the html -> pdf conversion (disregarding images, I haven't looked at that since my usage doesn't require it)

It uses iTextSharp's XmlWorkerHelper, so the incoming hmtl must be valid XHTML, so you may need to do some fixup depending on your input.

using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
using System.IO;
using System.Web.Mvc;

namespace Sample.Web.Controllers
{
    public class PdfConverterController : Controller
    {
        [ValidateInput(false)]
        [HttpPost]
        public ActionResult HtmlToPdf(string html)
        {           

            html = @"<?xml version=""1.0"" encoding=""UTF-8""?>
                 <!DOCTYPE html 
                     PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
                    ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
                 <html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" lang=""en"">
                    <head>
                        <title>Minimal XHTML 1.0 Document with W3C DTD</title>
                    </head>
                  <body>
                    " + html + "</body></html>";

            var bytes = System.Text.Encoding.UTF8.GetBytes(html);

            using (var input = new MemoryStream(bytes))
            {
                var output = new MemoryStream(); // this MemoryStream is closed by FileStreamResult

                var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 50, 50, 50, 50);
                var writer = PdfWriter.GetInstance(document, output);
                writer.CloseStream = false;
                document.Open();

                var xmlWorker = XMLWorkerHelper.GetInstance();
                xmlWorker.ParseXHtml(writer, document, input, null);
                document.Close();
                output.Position = 0;

                return new FileStreamResult(output, "application/pdf");
            }
        }
    }
}
µBio
  • 10,668
  • 6
  • 38
  • 56
  • Thanks for this, provides crisper PDFs than HtmlRenderer and PDFSharp. I checked, your code supports images. I did this to test html = "" – David Silva Smith Oct 15 '13 at 13:03
10

I would one-up'd mightymada's answer if I had the reputation - I just implemented an asp.net HTML to PDF solution using Pechkin. results are wonderful.

There is a nuget package for Pechkin, but as the above poster mentions in his blog (http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/ - I hope she doesn't mind me reposting it), there's a memory leak that's been fixed in this branch:

https://github.com/tuespetre/Pechkin

The above blog has specific instructions for how to include this package (it's a 32 bit dll and requires .net4). here is my code. The incoming HTML is actually assembled via HTML Agility pack (I'm automating invoice generations):

public static byte[] PechkinPdf(string html)
{
  //Transform the HTML into PDF
  var pechkin = Factory.Create(new GlobalConfig());
  var pdf = pechkin.Convert(new ObjectConfig()
                          .SetLoadImages(true).SetZoomFactor(1.5)
                          .SetPrintBackground(true)
                          .SetScreenMediaType(true)
                          .SetCreateExternalLinks(true), html);

  //Return the PDF file
  return pdf;
}

again, thank you mightymada - your answer is fantastic.

Carl Steffen
  • 899
  • 7
  • 6
  • 5
    BEWARE: Pechkin (and TuesPechkin) are superior to iTextSharp in almost all ways (IMHO), except that they don't work in Azure Web Sites (perhaps many shared hosting environments?) – Jay Querido Nov 26 '14 at 20:03
  • Pechkin is a wrapper around wkhtmltopdf, which uses QT Webkit to *render* a web page to a pdf. It's essentially the same as saying "print to PDF" in Safari (a Webkit-based browser). Which is an entirely different use case from creating a PDF file from code. And which is also the *exact opposite* of what the OP asked. So I am downvoting. – Amedee Van Gasse Dec 18 '17 at 10:39
6

I prefer using another library called Pechkin because it is able to convert non trivial HTML (that also has CSS classes). This is possible because this library uses the WebKit layout engine that is also used by browsers like Chrome and Safari.

I detailed on my blog my experience with Pechkin: http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

mightymada
  • 121
  • 1
  • 4
3

It has ability to convert HTML file in to pdf.

Required namespace for conversions are:

using iTextSharp.text;
using iTextSharp.text.pdf;

and for conversion and download file :

// Create a byte array that will eventually hold our final PDF
Byte[] bytes;

// Boilerplate iTextSharp setup here

// Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
    // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document())
    {
        // Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms))
        {
            // Open the document for writing
            doc.Open();

            string finalHtml = string.Empty;

            // Read your html by database or file here and store it into finalHtml e.g. a string
            // XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(finalHtml))
            {
                // Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            doc.Close();
        }
    }

    // After all of the PDF "stuff" above is done and closed but **before** we
    // close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

// Clear the response
Response.Clear();
MemoryStream mstream = new MemoryStream(bytes);

// Define response content type
Response.ContentType = "application/pdf";

// Give the name of file of pdf and add in to header
Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf");
Response.Buffer = true;
mstream.WriteTo(Response.OutputStream);
Response.End();
Christian
  • 27,509
  • 17
  • 111
  • 155
Nitin Singh
  • 160
  • 1
  • 8
3

2020 UPDATE:

Converting HTML to PDF is very simple to do now. All you have to do is use NuGet to install itext7 and itext7.pdfhtml. You can do this in Visual Studio by going to "Project" > "Manage NuGet Packages..."

Make sure to include this dependency:

using iText.Html2pdf;

Now literally just paste this one liner and you're done:

HtmlConverter.ConvertToPdf(new FileInfo(@"temp.html"), new FileInfo(@"report.pdf"));

If you're running this example in visual studio, your html file should be in the /bin/Debug directory.

If you're interested, here's a good resource. Also, note that itext7 is licensed under AGPL.

Lemons
  • 379
  • 4
  • 14
3

The above code will certainly help in converting HTML to PDF but will fail if the the HTML code has IMG tags with relative paths. iTextSharp library does not automatically convert relative paths to absolute ones.

I tried the above code and added code to take care of IMG tags too.

You can find the code here for your reference: http://www.am22tech.com/html-to-pdf/

Anil Gupta
  • 498
  • 6
  • 9
Soan
  • 39
  • 1
  • You identify the problem, but the solution you are referencing with the IImageProvider yields the following error `Could not find a part of the path 'C:\intl\en_ALL\images\srpr\logo1w.png'.` when I try to generate PDF by reading HTML from `www.google.com`. – cusman Feb 26 '13 at 21:12
1

If you are converting html to pdf on the html server side you can use Rotativa :

Install-Package Rotativa

This is based on wkhtmltopdf but it has better css support than iTextSharp has and is very simple to integrate with MVC (which is mostly used) as you can simply return the view as pdf:

public ActionResult GetPdf()
{
    //...
    return new ViewAsPdf(model);// and you are done!
} 
meJustAndrew
  • 6,011
  • 8
  • 50
  • 76