export to pdf with html decode

Question

I want to display one column (from a data source, using GridView) with html tags into a PDF . I want the HTML to be decoded so that in the PDF, it won't print the literal html tags . here's my code:

In GridView_RowDataBound event:

for (int i = 0; i < GridView1.Rows.Count; i++)
    {
        if (GridView1.Rows[i].RowType == DataControlRowType.DataRow)
        {

            for (int j = 0; j < 6; j++)
            {
                decodeHTML = HttpUtility.HtmlDecode(GridView1.Rows[i].Cells[j].Text);
                GridView1.Rows[i].Cells[j].Text = decodeHTML;
            }
        }
    }

then added the HTML decoded gridview into PDF cell:

    Phrase cellText = new Phrase(GridView1.Rows[i].Cells[j].Text, baseFontNormal);


    iTextSharp.text.pdf.PdfPCell cell = new PdfPCell(cellText);
    if (j == 3) cell.HorizontalAlignment = PdfPCell.ALIGN_CENTER;
    table.AddCell(cell);

Instead of displaying the data in a PDF format, it displays them in an HTML page (in browser). However, it will be displayed as PDF file ONLY if I remove the GridView_RowDataBound event, but then the data will print literal html tags, and I don't want this.

*"Instead of displaying the data in a PDF format, it displays them in an HTML page (in browser)."* - if an html page is displayed instead of a pdf, you obviously return html and not pdf. So take a look at the code that writes to the response. — mkl, Dec 22 '17 at 08:12
but when I remove (comment out) the code in the GridView_RowDataBound event, it exports as a PDF file as expected. — Chrisantics, Dec 22 '17 at 08:19
Hi @mkl, I think that the OP is making the wrong assumption about the `HtmlDecode` method. — Bruno Lowagie, Dec 22 '17 at 15:50
Yes, because "it displays them in an HTML page" was not an exact description, the page was a pdf page which merely showed some html source... — mkl, Dec 27 '17 at 07:40

score 1 · Accepted Answer · answered Dec 22 '17 at 15:50

You are making the wrong assumption about the HtmlDecode method. You assume that this method can decode HTML, e.g. This is italic and this is bold! into something like:

This is italic and this is bold!

But that is not the case. Take a look at the API documentation on MSDN:

Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.

What does this mean?

It means that you can use HtmlDecode to convert a string like this:

&lt;p&gt;This is &lt;i&gt;italic&lt;/i&gt; and this is &lt;b&gt;bold&lt;/b&gt;!&lt;/p&gt;

Into a string like this:

<p>This is <i>italic</i> and this is <b>bold</b>!</p>

The HtmlDecode method does not remove the tags. It makes sure that entities in the string (stuff like &) are converted to readable content. Hence it is normal that you see the tags in the PDF.

When you have HTML content, and you want to convert it to iText objects, you need an add-on to make that conversion. See Converting HTML to PDF using iText to find out how to do this.

export to pdf with html decode

1 Answers1