0

I found the problem when I tried this answer in VB.NET (little modified) :

Function HtmlToPDF(ByVal Url As String) As MemoryStream
     Dim wc As New WebClient
     Dim htmlText = wc.DownloadString(Url)

     Dim msOutput As New MemoryStream
     Dim reader As New StringReader(htmlText)

     Dim document As New Document(PageSize.A4, 30, 30, 30, 30)

     PdfWriter.GetInstance(document, msOutput)

     Dim worker As New HTMLWorker(document)

     document.Open()

     worker.StartDocument()

     worker.Parse(reader)

     worker.EndDocument()
     worker.Close()
     document.Close()

     Return msOutput
End Function

I got an error the path is not a legal form in this code :

worker.Parse(reader)

I check the value reader in debug is Nothing/NULL but there is value in my htmlText.

Then, I tried another code but the error still same as before. This is the code :

Function HtmlToPDF(ByVal Url As String) As MemoryStream
     Dim wc As New WebClient
     Dim htmlText = wc.DownloadString(Url)

     Dim msOutput As New MemoryStream

     Dim document As New Document(PageSize.A4, 30, 30, 30, 30)

     PdfWriter.GetInstance(document, msOutput)

     document.Open()

     Response.Write(htmlText)

     Dim htmlArrayList As New List(Of IElement)
     htmlArrayList = HTMLWorker.ParseToList(New StringReader(htmlText), Nothing)

     For k As Integer = 0 To htmlArrayList.Count()
        document.Add(htmlArrayList(k))
     Next

     document.Close()

     Return msOutput
End Function

The error is same in this code : HTMLWorker.ParseToList(New StringReader(htmlText), Nothing)

Where is the problem? VB.NET newbie here, thanks in advance.

Community
  • 1
  • 1
andrefadila
  • 647
  • 2
  • 9
  • 36

1 Answers1

0

Check your paths in any image tag you may have in the html. and also check for things like <hr> tags

I'm currently getting the same error because the file doesn't actually exist on the server in some html i'm trying to parse.

for example "/images/logo.png" doesn't exist on my development server. (be aware that htmlparser can't handle relative urls. urls in any img tag src have to be absolute urls

http://kuujinbo.info/iTextSharp/tableWithImageToPdf.aspx

and I get the error.

I also get a NullReferenceException with <hr> tags, if you are debugging in visual studio, step through until the exception happens, click 'view exception details' and then click in the stack trace. In my example with <hr> im getting

at iTextSharp.text.html.simpleparser.HTMLWorker.CreateLineSeparator(IDictionary`2 attrs)
   at iTextSharp.text.html.simpleparser.HTMLTagProcessors.HTMLTagProcessor_HR.StartElement(HTMLWorker worker, String tag, IDictionary`2 attrs)
   at iTextSharp.text.html.simpleparser.HTMLWorker.StartElement(String tag, IDictionary`2 attrs)
   at iTextSharp.text.xml.simpleparser.SimpleXMLParser.ProcessTag(Boolean start)
   at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Go(TextReader reader)
   at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Parse(ISimpleXMLDocHandler doc, ISimpleXMLDocHandlerComment comment, TextReader r, Boolean html)
   at iTextSharp.text.html.simpleparser.HTMLWorker.Parse(TextReader reader)
   at iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(TextReader reader, StyleSheet style, IDictionary`2 tags, Dictionary`2 providers)
   at iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(TextReader reader, StyleSheet style, Dictionary`2 providers)
   at Health.Management.Web.Site.FileDownload.Download(String content, iTextSharpHeaderFooter hf, Int32 type, Boolean preview, String previewText) in

incidentally the old HTMLWorker.Parse is depricated, and you should use the new xml worker. I only use it (the old parser) as a fallback when trying to process legacy(badly formatted) html code.

Excellent examples of how to use iTextSharp can be found here:

http://kuujinbo.info/code_index.aspx?tab=2

kolin
  • 2,326
  • 1
  • 28
  • 46