3

I am generating HTML code on the fly for a catalog, and I would like to generate a PDF as well. I considered just printing the HTML page to a PDF doc, but I lose some of the background shading and things, and it splits content across pages.

I've read a bit about iText, but I haven't figured out how to format it properly, and I don't know how to make it so it doesn't split my content across pages.

This is the beginning of my HTML page, I included several items so you can see how the content is broken down. I apologize for the ugly HTML, I cannot for the live of me get a div table to look right!

<style type="text/css">
<!--
tr#odd {
    background-color:#e2e2e2;
    vertical-align:top;
}

tr#even {
    vertical-align:top;
}
div#title {
    font-size:16px;
    font-weight:bold;
}

div#mpaa {
    font-size:10px;
}

div#genre {
    font-size:12px;
    font-style:italic;
}

div#plot {
    height: 63px;
    font-size:12px;
    overflow:hidden;
}
-->
</style>

<html>
    <title>Movie Catalog</title>
    <body>
718 Movies
<br />
<br />
        <table>
            <tr id="odd">
                <td>
                    <img src=".\images\10,000BCDVDrip.jpg" width="75" height="110">
                </td>
                <td>
                    <div id="title">10,000 BC</div>
                    <div id="mpaa"> </div>
                    <div id="genre">Adventure, Drama</div>
                    <div id="plot">A prehistoric epic that follows a young mammoth hunter's journey through uncharted territory to secure the future of his tribe.</div>
                </td>
            </tr>

            <tr id="even">
                <td>
                    <img src=".\images\101Dalmatians1961PlatinumEditionDVDRipXviD.jpg" width="75" height="110">
                </td>
                <td>
                    <div id="title">101 Dalmatians (Platinum Edition)</div>
                    <div id="mpaa">G </div>
                    <div id="genre">Comedy, Family, Disney</div>
                    <div id="plot">The Live action adaptation of a Disney Classic. When a litter of dalmatian puppies are abducted by the minions of Cruella De Vil, the parents must find them before she uses them for a diabolical fashion statement.</div>
                </td>
            </tr>

            <tr id="odd">
                <td>
                    <img src=".\images\102DalmationsDVDrip.jpg" width="75" height="110">
                </td>
                <td>
                    <div id="title">102 Dalmations</div>
                    <div id="mpaa">G </div>
                    <div id="genre">Family</div>
                    <div id="plot">After a spot of therapy Cruella De Vil is released from prison a changed woman. Devoted to dogs and good causes, she is delighted that Chloe, her parole officer, has a dalmatian family and connections with a dog charity. But the sound of Big Ben can reverse the treatment so it is only a matter of time before Ms De Vil is back to her incredibly ghastly ways, using her new-found connections with Chloe and friends</div>
                </td>
            </tr>

            <tr id="even">
                <td>
                    <img src=".\images\127Hours2010720pBluRayx264.jpg" width="75" height="110">
                </td>
                <td>
                    <div id="title">127 Hours</div>
                    <div id="mpaa">R Rated R for language and some disturbing violent content/bloody images.</div>
                    <div id="genre">Action, Adventure, Drama, Suspense, Thriller</div>
                    <div id="plot">127 Hours is the true story of mountain climber Aron Ralston's (James Franco) remarkable adventure to save himself after a fallen boulder crashes on his arm and traps him in an isolated canyon in Utah. Over the next five days Ralston examines his life and survives the elements to finally discover he has the courage and the wherewithal to extricate himself by any means necessary, scale a 65 foot wall and hike over eight miles before he is finally rescued. Throughout his journey, Ralston recalls friends, lovers (Clemence Poesy), family, and the two hikers (Amber Tamblyn and Kate Mara) he met before his accident. Will they be the last two people he ever had the chance to meet?</div>
                </td>
            </tr>

            <tr id="odd">
                <td>
                    <img src=".\images\13GoingOn30DVDrip.jpg" width="75" height="110">
                </td>
                <td>
                    <div id="title">13 Going On 30</div>
                    <div id="mpaa">PG-13 for some sexual content and brief drug references</div>
                    <div id="genre">Comedy, Fantasy, Romance</div>
                    <div id="plot">After total humiliation at her thirteenth birthday party, Jenna Rink wants to just hide until she's thirty. Thanks to some wishing dust, Jenna's prayer has been answered. With a knockout body, a dream apartment, a fabulous wardrobe, an athlete boyfriend, a dream job, and superstar friends, this can't be a better life. Unfortunetly, Jenna realizes that this is not what she wanted. The only one that she needs is her childhood best friend, Matt, a boy that she thought destroyed her party. But when she finds him, he's a grown up, and not the same person that she knew.</div>
                </td>
            </tr>
            ...
            ...
        </table>
    </body>
</html>

You can see what it looks like at: http://timelessdesigncafe.com/movies/catalog.html Notice that the background shading alternates. When I print to PDF I lose the shading, and more importantly, it spits a "row"/movie over two pages, and I need to avoid that.

Thanks in advance!!

Stormenet
  • 25,926
  • 9
  • 53
  • 65
Dizzy49
  • 1,360
  • 24
  • 35
  • You might want to look at reporting engines like Crystal Reports or DevExpress XtraReport. Note, that a version of Crystal Reports is included in every VS before 2010, and a no cost version for 2010 is available from SAP. – Sascha Jul 12 '11 at 07:15

7 Answers7

5

Nobody has mentioned wkhtmltopdf? :)

duedl0r
  • 9,289
  • 3
  • 30
  • 45
0

You can use the OpenOffice API to do this conversion, following these steps in your code:

  • Load the OpenOffice API
  • Open the desired HTML file
  • Save it as PDF

I know it works for VB (already used it in VBScripts), C++ and Java, you should be able to do the same thing with C#.

Links:
http://www.kalitech.fr/clients/doc/VB_APIOOo_en.html http://wiki.services.openoffice.org/wiki/API/Tutorials/PDF_export

Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
0

There are too many ways that you can do it. Please check this topic. If you want to use free library or tool you can use iTextSharp, but free version doesn't cover all requirement. So you can use some other tools such as ABCPdf

Community
  • 1
  • 1
Peyman
  • 3,068
  • 1
  • 18
  • 32
0

If you are in a position to use WPF you might want to consider using FixedDocument and doing your layout for print in XAML. You can then rasterize the XAML (taking advantage of data-binding if appropriate) to an XPS, Microsoft's XML Paper Standard for document layout (essentially their version of PDF).

The advantage of this approach is the ability to leverage data-binding and XAML's (IMHO) superior (to HTML) layout functionality. I have been using this stack as a lightweight reporting solution for a while now. (You need to generate the report on an STA thread).

The next step (yes, this is perhaps getting a bit complicated) would be to then pass your XPS stream through some converter to PDF format, not sure if such a thing exists however. You would otherwise be relying on your clients having an XPS reader (although this is built into recent version of Windows & Office).

James Webster
  • 4,046
  • 29
  • 41
0

Properly layouting HTML is a non-trivial task. My estimate is it would probably take me one or two years to get it right.

So this is not the way to go. Instead, you should filter the HTML for the data and then write a small, dedicated PDF formatter which does exactly what you need and which breaks with even the smallest changes in the input HTML.

That should take a week or so. When you're done with that, make it more resilient to changes in the input HTML.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • OOo is capable of layouting HTML in a very reliable way. It took me one ore two hours to write the fifteen-lines VBScript allowing to automatically convert HTML (but also DOC, RTF, etc.) files to PDF. The API is really well-documented. – Mathieu Rodic Jul 12 '11 at 07:48
  • 1
    I'd be very surprised if OOo implemented more than a tiny fraction of the huge HTML "standard". Moreover, how does it handle browser bugs? Not to mention that no browser is able to print most HTML pages in a reliable way, not only when they contain lots of flash and ads. So I can't follow your argument unless you're saying "OOo can render a trivial subset of HTML pretty well." In that case, I agree. – Aaron Digulla Jul 12 '11 at 11:34
0

If you don't mind spending a bit of money you could invest in PrinceXML, which formats any Xml document (including XHtml) into a .pdf document, applying full layout rules to the Html content. In fact Prince is more compliant with web standards when doing its layout pass than many web browsers are :)

MattDavey
  • 8,897
  • 3
  • 31
  • 54
0

Take a look at WebToPDF.NET which is a .NET component written in C# that converts HTML to PDF. You will get a pdf file which looks exactly the same as your HTML file. I belive there is ability to specify the page size you can use it to specify a very long page to get everything on the one page.

The converter supports HTML 4.01, XHTML 1.0, XHTML 1.1 and CSS 2.1 including page breaks, forms and links. It passes all W3C tests (except BIDI).

zavolokas
  • 697
  • 1
  • 5
  • 20