9

I am trying to generate a .pdf from html using the library ITextSharp. I am able to create the pdf with the html text converted to pdf text/paragraphs

My Problem: The pdf does not show my images(my img elements from the html). All my img html elements in my html dont get displayed in the pdf? Is it possible for ITextSharp to parse HTML & display images. I really hope so otherwise I am stuffed :(

I am linking to the correct directory where the images are(using IMG_BASURL) but they are just not showing

My code:

// mainContents variable is a string containing my HTML
var document = new Document(PageSize.A4, 50, 50, 80, 100);
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
document.open();

Hashtable providers = new Hashtable();
providers.Add("img_baseurl","C:/users/xx/VisualStudio/Projects/myproject/");
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(mainContents), null, providers);
foreach (var htmlElement in parsedHtmlElements)
   document.Add(htmlElement as IElement);

document.Close();
sazr
  • 24,984
  • 66
  • 194
  • 362

3 Answers3

11

Every time that I've encountered this the problem was that the image was too large for the canvas. More specifically, even a naked IMG tag internally will get wrapped in a Chunk that will get wrapped in a Paragraph, and I think that the image is overflowing the Paragraph but I'm not 100% sure.

The two easy fixes are to either enlarge the canvas or to specify image dimensions on the HTML IMG tag. The third more complex route would be to use an additional provider IMG_PROVIDER. To do this you need to implement the IImageProvider interface. Below is a very simple version of one

    public class ImageThing : IImageProvider {
        //Store a reference to the main document so that we can access the page size and margins
        private Document MainDoc;
        //Constructor
        public  ImageThing(Document doc) {
            this.MainDoc = doc;
        }
        Image IImageProvider.GetImage(string src, IDictionary<string, string> attrs, ChainedProperties chain, IDocListener doc) {
            //Prepend the src tag with our path. NOTE, when using HTMLWorker.IMG_PROVIDER, HTMLWorker.IMG_BASEURL gets ignored unless you choose to implement it on your own
            src = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"\" + src;
            //Get the image. NOTE, this will attempt to download/copy the image, you'd really want to sanity check here
            Image img = Image.GetInstance(src);
            //Make sure we got something
            if (img == null) return null;
            //Determine the usable area of the canvas. NOTE, this doesn't take into account the current "cursor" position so this might create a new blank page just for the image
            float usableW = this.MainDoc.PageSize.Width - (this.MainDoc.LeftMargin + this.MainDoc.RightMargin);
            float usableH = this.MainDoc.PageSize.Height - (this.MainDoc.TopMargin + this.MainDoc.BottomMargin);
            //If the downloaded image is bigger than either width and/or height then shrink it
            if (img.Width > usableW || img.Height > usableH) {
                img.ScaleToFit(usableW, usableH);
            }
            //return our image
            return img;
        }
    }

To use this provider just add it to the provider collection like you did with HTMLWorker.IMG_BASEURL:

providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

It should be noted that if you use HTMLWorker.IMG_PROVIDER that you are responsible for figuring out everything about the image. The code above assumes that all image paths need to be prepended with a constant string, you'll probably want to update this and check for HTTP at the start. Also, because we're saying that we want to completely handle the image processing pipeline the provider HTMLWorker.IMG_BASEURL is no longer needed.

The main code loop would now look something like this:

        string html = @"<img src=""Untitled-1.png"" />";
        string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
        using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
            using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
                using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                    doc.Open();
                    using (StringReader sr = new StringReader(html)) {
                        System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
                        providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

                        var parsedHtmlElements = HTMLWorker.ParseToList(sr, null,  providers);
                        foreach (var htmlElement in parsedHtmlElements) {
                            doc.Add(htmlElement as IElement);
                        }
                    }
                    doc.Close();
                }
            }
        }

One last thing, make sure to specify which version of iTextSharp you are targetting when posting here. The code above targets iTextSharp 5.1.2.0 but I think you might be using the 4.X series.

Chris Haas
  • 53,986
  • 12
  • 141
  • 274
  • I'm trying to create and use a class that implements IImageProvider based on this code but it's not working and I believe it has something to do with this line: providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc)); I'm coding in C# and I'm not able to use the HTMLWorker.IMG_PROVIDER enumeration when I add to my hashtable. Does it matter if this value is simply a string? Also maybe I need to do something further from inside the loop to actually execute the GetImage code from the ImageProvider class. I'm using itextsharp 4.1.6, in case that matters. – Neitherman Feb 06 '14 at 14:36
  • @Neitherman, please post a new question with what you tried and what didn't work, referencing this post if it makes sense. – Chris Haas Feb 07 '14 at 14:35
  • Posted new question: http://stackoverflow.com/questions/21684040/imageprovider-not-working-in-html-to-pdf-conversion – Neitherman Feb 10 '14 at 17:32
  • `IImageProvider` is depracted since 5.5.2 (http://api.itextpdf.com/itext/com/itextpdf/text/html/simpleparser/ImageProvider.html) – aggsol Feb 13 '15 at 15:07
  • `HTMLWorker` itself deprecated, not just that class. – Chris Haas Feb 14 '15 at 15:51
2

I faced the same problem and tried the following proposed solutions: string replaced a tag, encode in base64 and embed the image to a .NET class library but none worked ! So I've come to the old fashioned solution: adding the logo manually with doc.Add()
Here's your code updated:

string html = @"<img src=""Untitled-1.png"" />";
        string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
        using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
            using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
                using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                    doc.Open();
                    using (StringReader sr = new StringReader(html)) {
                        System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
                        providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

                        var parsedHtmlElements = HTMLWorker.ParseToList(sr, null,  providers);
                        foreach (var htmlElement in parsedHtmlElements) {
                            doc.Add(htmlElement as IElement);
                        }
// here's the magic
var logo = iTextSharp.text.Image.GetInstance(Server.MapPath("~/HTMLTemplate/logo.png"));
                logo.SetAbsolutePosition(440, 800);
                document.Add(logo);
// end
                    }
                    doc.Close();
                }
            }
        }
Fourat
  • 2,366
  • 4
  • 38
  • 53
  • To save someone time, you don't need the server.map path. You can use a file path if you aren't using a web app. – Tyler C Apr 27 '17 at 15:28
-1
string siteUrl = HttpContext.Current.Server.MapPath("/images/image/ticket/Ticket.jpg");
string HTML = "<table><tr><td><u>asdasdsadasdsa <img src='" + siteUrl + "' al='tt' /> </u></td></tr></table>";
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • It would be nice to know how or why this works, if it does... and I believe this would not work with the current version of iTextSharp as the tag is not supported.
    – azarc3 Aug 22 '14 at 13:26
  • Just got this working and though @GuruRaja's initial comment won't work in the current version of iTextSharp. It will work, however, if you remove the , , and
    container tags AND use a URL as the image source (not a physical server path).
    – azarc3 Aug 22 '14 at 15:55