Convert html page to pdf file using itextsharp version 5.5.5.0

Question

I want to convert an HTML page to a PDF page. I have a windows application.

I saw many articles but did not find any right solution. I am also facing the images path issue and some other issues like the input string is not of the correct format. Pleas help me to find a solution for that so that I can use it in my windows application.

I am using the following code

Private Sub Button2_Click_1(sender As Object, e As EventArgs) Handles Button2.Click
    Dim document As New Document()
    Try
        PdfWriter.GetInstance(document, New FileStream(AppDomain.CurrentDomain.BaseDirectory + "\SCRA_Resources\SCRA.pdf", FileMode.Create))
        document.Open()
        Dim wc As New WebClient()
        Dim htmlText As String = wc.DownloadString(AppDomain.CurrentDomain.BaseDirectory + "\SCRA_Resources\SCRA.html")
        Dim htmlarraylist = HTMLWorker.ParseToList(New StringReader(htmlText), Nothing)
        For k As Integer = 0 To htmlarraylist.Count - 1
            document.Add(DirectCast(htmlarraylist(k), IElement))
        Next
        document.Close()
    Catch
    End Try
End Sub

When i run this code i am getting the error Could not find file 'C:\TestProjects\MergePDfs\MergePDfs\bin\Debug\help.gif'.

I am putting these image where my html file is save. But the html worker cut the path two folder before. And also its not taking the CSS fully.

please share what you have tried and the specific areas where you are facing an issue — NoviceProgrammer, Apr 28 '15 at 17:03
This is a great place to start: http://stackoverflow.com/questions/25164257/how-to-convert-html-to-pdf-using-itextsharp Make sure to use [XML Worker](http://itextpdf.com/product/xml_worker), **NOT** `HTMLWorker`. If you have a problem with paths to images, you'll need to create your own implementation of the `ImageProvider` interface (see [ParseHtmlImagesLinks](http://itextpdf.com/sandbox/xmlworker/D09_ParseHtmlImagesLinks) vs [ParseHtmlImagesLinksOops](http://itextpdf.com/sandbox/xmlworker/D08_ParseHtmlImagesLinksOops) to compare a working examples vs an *Oops, it doesn't work* example). — Bruno Lowagie, Apr 28 '15 at 18:22

score 0 · Answer 1 · answered Apr 29 '15 at 13:50

Let me go through your code to explain a couple of things.

First get rid of your Try and Catch and avoid ever using them in the future. Sounds weird, I know. But everything in code is technically "try this" because every line of code can fail. The only reason to ever use the actual Try command is if you have a valid Catch block that actually does something useful. Logging is one thing. Showing an error message is another, but since you're in VS that's covered already.

Next are these two lines:

Dim htmlText As String = wc.DownloadString(AppDomain.CurrentDomain.BaseDirectory + "\SCRA_Resources\SCRA.html")
Dim htmlarraylist = HTMLWorker.ParseToList(New StringReader(htmlText), Nothing)

The right part of the first line is "get some HTML from a very specific location" and the left part is "and put that into a variable as a string that's totally unaware of the original specific location". Read this a couple of times if it doesn't make sense because it should explain why the second line can't find the images.

Your image links are all relative but relative to what? I know you want it to be your specific folder but you didn't actually specify that in any way. HTML has (or maybe had, I haven't done this in a decade probably) a way to do this via the base tag but I don't know if iText supports that. So instead you need to tell iText "when I say relative, I mean relative to this folder".

Before continuing, it is important to understand that you are using a very old, officially obsoleted and no longer supported helper class that lacks many features and will eventually cause you a lot of grief. The HTMLWorker class was replaced with the XMLWorker class many years ago. Although the HTMLWorker class sounds like something that's more appropriate, think of the XMLWorker as "XHTML" instead of "XML".

Okay, so if you're stuck using HTMLWorker, you can solve this by implementing the iTextSharp.text.html.simpleparser.IImageProvider interface. If you do this and you are using the 5.x series you should hopefully get a bunch of warnings because, as was said above, HTMLWorker is officially obsoleted. The GetImage method of this interface will be called for every image in your document. Below is a very simple implementation that takes a single parameter for the constructor that specifies what the new location should be. Ideally you should add some error handling (this is a good candidate for a Try\Catch because your Catch could be to include an explicit "image not found image") and if you have a mixture of absolute and relative images you should check for that, too.

Public Class RelativeRootImageProvider
    Implements iTextSharp.text.html.simpleparser.IImageProvider

    Public Property BasePath As String

    Public Sub New(basePath As String)
        Me.BasePath = basePath
    End Sub

    Public Function GetImage(src As String,
                             attrs As IDictionary(Of String, String),
                             chain As iTextSharp.text.html.simpleparser.ChainedProperties,
                             doc As IDocListener) As iTextSharp.text.Image Implements iTextSharp.text.html.simpleparser.IImageProvider.GetImage
        ''//This should also check to see if src is absolute and maybe try getting it first before the below.
        ''//The below could also have a File.Exists() check, too.
        Dim newSrc = System.IO.Path.Combine(BasePath, src)
        Return iTextSharp.text.Image.GetInstance(newSrc)
    End Function
End Class

To use this you just need to create a special collection and add it to it:

''//Pick a folder
Dim RelativeImageRootPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)

''//Collection of providers
Dim providers As New System.Collections.Generic.Dictionary(Of String, Object)()

''//Add our image provider pointed to our specific folder
providers.Add(HTMLWorker.IMG_PROVIDER, New RelativeRootImageProvider(RelativeImageRootPath))

And then pass the providers as the third parameter of the ParseToList method:

Dim htmlarraylist = HTMLWorker.ParseToList(New StringReader(htmlText), Nothing, providers)

Convert html page to pdf file using itextsharp version 5.5.5.0

1 Answers1