Tables in PDF with horizontal page breaks

Question

Does someone know a (preferably open-source) PDF layout engine for Java, capable of rendering tables with horizontal page breaks? "Horizontal page breaking" is at least how the feature is named in BIRT, but to clarify: If a table has too many columns to fit across the available page width, I want the table to be split horizontally across multiple pages, e.g. for a 10-column table, the columns 1-4 to be output on the first page and columns 5-10 on the second page. This should of course also be repeated on the following pages, if the table has too many rows to fit vertically on one page.

So far, it has been quite difficult to search for products. I reckon that such a feature may be named differently in other products, making it difficult to use aunt Google to find a suitable solution.

So far, I've tried:

BIRT claims to support this, but the actual implementation is so buggy, that it cannot be used. I though it is self-evident for such a functionality, that the row height is kept consistent across all pages, making it possible to align the rows when placing the pages next to each other. BIRT however calculates the required row height separately for each page.
Jasper has no support.
I also considered Apache FOP, but I don't find any suitable syntax for this in the XSL-FO specification.
iText is generally a little bit too "low level" for this task anyway (making it difficult to layout other parts of the intended PDF documents), but does not seem to offer support.

Since there seem to be some dozens other reporting or layout engines, which may or may not fit and I find it a little bit difficult to guess exactly what to look for, I was hoping that someone perhaps already had similar requirements and can provide at least a suggestion in the right direction. It is relatively important that the product can be easily integrated in a Java server application, a native Java library would be ideal.

Expected Layout

Now, to keep the rows aligned across all pages, the row heights must be calculated as follows:

Row1.height = max(A1.height, B1.height, C1.height, D1.height)
Row2.height = max(A2.height, B2.height, C2.height, D2.height)

While BIRT currently seem to do something like:

Page1.Row1.height = max(A1.height, B1.height)
Page2.Row1.height = max(C1.height, D1.height)
Page1.Row2.height = max(A2.height, B2.height)
Page2.Row2.height = max(C2.height, D2.height)

Second Layout

Are your column widths highly dynamic/variable? I mean do you know if it is one or two columns that widen to cause the horizontal-break requirement or if it could be any column? Or is it that you have a variable number of columns? — Paul Jowett, Mar 15 '13 at 02:00
@Dave Jarvis: No, but if Jasper cannot layout such tables, I don't understand why DynamicJasper should? Or have I missed something? Just as a notice: I don't need someone to point me to arbitrary reporting engines here, that I can google myself. — jarnbjo, Mar 15 '13 at 12:41
@jowierun: The column count is fixed and it would also be acceptable to have fixed column widths. — jarnbjo, Mar 15 '13 at 12:42
Can you then solve the problem by having multiple tables? eg table1 has the first 5 columns and table2 has the second 5 columns? — Paul Jowett, Mar 15 '13 at 13:51
@jowierun: No. At least not unless I am not seeing some obvious way to get consistent row heights across all tables. Even if the column widths can be fixed, the row heights must be calculated dynamically depending on the content, but consistent for each row across all pages. — jarnbjo, Mar 15 '13 at 13:58
@jambo: DynamicJasper adds "breaking groups"; I would not be surprised if you can inject custom logic to split columns across pages, possibly by counting the columns yourself and performing a quick calculation to see if they'll fit on the current page. "Columns can be defined at runtime, which means you also control (at runtime) the column positioning, width, title, etc." — Dave Jarvis, Mar 15 '13 at 15:39
@jarnbjo: Two pages showing how the columns should be split. It's nice to see a visual representation of the issue when it comes to reports. — Dave Jarvis, Mar 15 '13 at 17:46
@Dave: You can check the thread I started on the BIRT forum. You'll find a few drawings, where the expected layout should be explained thoroughly: http://www.eclipse.org/forums/index.php/m/1009566/ — jarnbjo, Mar 15 '13 at 17:56
I bet that this is possible in iText. Here's a mailing list discussion that seems to be addressing the same problem: http://itext-general.2136553.n4.nabble.com/Building-a-pdf-with-tables-that-break-across-multiple-pages-td2167032.html — ach, Mar 15 '13 at 18:13
@ach: Almost. Using iText and PdfPTable.writeSelectedRows, I can actually split the table horizontally and the row height is kept across all pages. The problem is however that writeSelectedRows only allows vertical page breaks between rows. If /one/ cell is too tall to fit on one page, there seem to be no way to render it. — jarnbjo, Mar 18 '13 at 16:36
I've posted an example that uses `iText` and `writeSelectedRows` to split both horizontally and vertically. — dcernahoschi, Apr 11 '13 at 20:52

dcernahoschi · Answer 1 · 2013-04-12T15:01:24.193

It's possible to display a table the way you want with iText. You need to use custom table positioning and custom row and column writing.

I was able to adapt this iText example to write on multiple pages horizontally and vertically. The idea is to remember the start and end row that get in vertically on a page. I've put the whole code so you can easily run it.

public class Main {
    public static final String RESULT = "results/part1/chapter04/zhang.pdf";

    public static final float PAGE_HEIGHT = PageSize.A4.getHeight() - 100f;

    public void createPdf(String filename)
            throws IOException, DocumentException {

        // step 1
        Document document = new Document();
        // step 2
        PdfWriter writer
                = PdfWriter.getInstance(document, new FileOutputStream(filename));
        // step 3
        document.open();

        //setup of the table: first row is a really tall one
        PdfPTable table = new PdfPTable(new float[] {1, 5, 5, 1});

        StringBuilder sb = new StringBuilder();

        for(int i = 0; i < 50; i++) {
            sb.append("tall text").append(i + 1).append("\n");
        }

        for(int i = 0; i < 4; i++) {
            table.addCell(sb.toString());
        }

        for (int i = 0; i < 50; i++) {
            sb = new StringBuilder("some text");
            table.addCell(sb.append(i + 1).append(" col1").toString());

            sb = new StringBuilder("some text");
            table.addCell(sb.append(i + 1).append(" col2").toString());

            sb = new StringBuilder("some text");
            table.addCell(sb.append(i + 1).append(" col3").toString());

            sb = new StringBuilder("some text");
            table.addCell(sb.append(i + 1).append(" col4").toString());
        }

        // set the total width of the table
        table.setTotalWidth(600);
        PdfContentByte canvas = writer.getDirectContent();

        ArrayList<PdfPRow> rows = table.getRows();

        //check every row height and split it if is taller than the page height
        //can be enhanced to split if the row is 2,3, ... n times higher than the page  
        for (int i = 0; i < rows.size(); i++) {
            PdfPRow currentRow = rows.get(i);

            float rowHeight = currentRow.getMaxHeights();

            if(rowHeight > PAGE_HEIGHT) {
                PdfPRow newRow = currentRow.splitRow(table,i, PAGE_HEIGHT);
                if(newRow != null) {
                    rows.add(++i, newRow);
                }
            }
        }

        List<Integer[]> chunks = new ArrayList<Integer[]>();

        int startRow = 0;
        int endRow = 0;
        float chunkHeight = 0;

        //determine how many rows gets in one page vertically
        //and remember the first and last row that gets in one page
        for (int i = 0; i < rows.size(); i++) {
            PdfPRow currentRow = rows.get(i);

            chunkHeight += currentRow.getMaxHeights();

            endRow = i;   

            //verify against some desired height
            if (chunkHeight > PAGE_HEIGHT) {
                //remember start and end row
                chunks.add(new Integer[]{startRow, endRow});
                startRow = endRow;
                chunkHeight = 0;
                i--;
            }
        }

        //last pair
        chunks.add(new Integer[]{startRow, endRow + 1});

        //render each pair of startRow - endRow on 2 pages horizontally, get to the next page for the next pair
        for(Integer[] chunk : chunks) {
            table.writeSelectedRows(0, 2, chunk[0], chunk[1], 236, 806, canvas);
            document.newPage();
            table.writeSelectedRows(2, -1, chunk[0], chunk[1], 36, 806, canvas);

            document.newPage();
        }


        document.close();
    }

    public static void main(String[] args) throws IOException, DocumentException {
        new Main().createPdf(RESULT);
    }
}

I understand that maybe iText is too low level just for reports, but it can be employed beside standard reporting tools for special needs like this.

Update: Now rows taller than page height are first splited. The code doesn't do splitting if the row is 2, 3,..., n times taller but can be adapted for this too.

How do you solve the problem I mentioned in my comment to my original question: writeSelectedRows can obviously only write entire rows to one page. If a single row is too high to fit on one page, how can I apply a page break within that single row? — jarnbjo, Apr 11 '13 at 23:11
Yeah, I overlooked this problem. I think one idea would be to split the tall row to fit the current page and render the new remaining row on the next. I'll try to update the answer later. — dcernahoschi, Apr 12 '13 at 08:07
@jarnbjo OK, updated the code, now splits the tall rows before rendering. — dcernahoschi, Apr 12 '13 at 15:02
This actually seem to do the trick. It's really quite tedious to layout PDFs with iText, but we're already using iText to concatenate several PDFs from multiple sources into one PDF. If I wrap this up in a slightly more intuitive API, we can use this code to generate at least the problematic table and then include the table in the complete document at the right position between fragments from other sources. Thank you very much! — jarnbjo, Apr 12 '13 at 15:35
@dcernahoschi Any idea which can stop splitting table while generating pdf from HTML using itext(latest version)? — Neha Choudhary, Nov 20 '14 at 08:24

score 1 · Answer 2 · edited Apr 10 '13 at 20:30

Same idea here than Dev Blanked but using wkhtmltopdf (https://code.google.com/p/wkhtmltopdf/) and some javascript, you can achieve what you need. When running wkhtmltopdf against this fiddle you get the result shown below (screenshot of pdf pages). You can place the "break-after" class anywhere on the header row. We use wkhtmltopdf server-side in a Java EE web app to produce dynamic reports and the performance is actually very good.

HTML

<body>
        <table id="table">
            <thead>
                <tr><th >Header 1</th><th class="break-after">Header 2</th><th>Header 3</th><th>Header 4</th></tr>
            </thead>
            <tbody>
                <tr valign="top">
                    <td>A1<br/>text<br/>text</td>
                    <td>B1<br/>text</td>
                    <td>C1</td>
                    <td>D1</td>
                </tr>
                <tr valign="top">
                    <td>A2</td>
                    <td>B2<br/>text<br/>text<br/>text</td>
                    <td>C2</td>
                    <td>D2<br/>text</td>
                </tr>
            </tbody>
        </table>
    </body>

Script

$(document).ready(function() {
    var thisTable = $('#table'),
        otherTable= thisTable.clone(false, true),
        breakAfterIndex = $('tr th', thisTable).index($('tr th.break-after', thisTable)),
        wrapper = $('<div/>');

    wrapper.css({'page-break-before': 'always'});
    wrapper.append(otherTable);
    thisTable.after(wrapper);
    $('tr', thisTable).find('th:gt(' + breakAfterIndex + ')').remove(); 
    $('tr', thisTable).find('td:gt(' + breakAfterIndex + ')').remove(); 
    $('tr', otherTable).find('th:lt(' + (breakAfterIndex + 1) + ')').remove(); 
    $('tr', otherTable).find('td:lt(' + (breakAfterIndex + 1) + ')').remove();

    $('tr', table).each(function(index) {
        var $this =$(this),
            $otherTr = $($('tr', otherTable).get(index)),
            maxHeight = Math.max($this.height(), $otherTr.height());
        $this.height(maxHeight);
        $otherTr.height(maxHeight);      
    });
});

Screenshot of the resulting PDF

Actually not a bad idea, hadn't it been for wkhtmltopdf messing up most of the other layout. There are over 600 open bugs in the bug tracker and the last release is 18 months old. Even rendering a simple text line fails, as seen in this screen shot: http://jarnbjo.de/wkhtml2pdf.png - The first line is the expected output, in the second line (wkhtmltopdf output) both the font size and the letter spacing is incorrect. — jarnbjo, Apr 11 '13 at 09:01
@jarnbo: Agree that rendering HTML to PDF requires some formatting because both do not play in the same coordinate space. We create special standardized HTML pages (with SVG charts) customized for PDF printing (special fonts, charts). Despite the opened issues, we never experienced a crash or failure under heavy testing with several customers. Remember also that the produced PDF rendering depends on your PDF reader, eg: Helvetica fonts are not available on Linux and make the on-screen rendering different than expected. — Guy, Apr 11 '13 at 10:39
Using physical measurement units in the CSS (pt, cm, inch etc), I would expect these to be adopted accordingly in the created PDF. Obviously they are not. I just skimmed through a few of the open bugs and many of them are unfortunately preventing us from using the tool, e.g. page breaking bugs causing page breaks to be inserted in the middle of a text line causing the upper half to be rendered to the current page and lower half to be rendered on the next line. The bugs are actually in the Webkit print engine and not in the wkhtml2pdf tool, but that doesn't solve the problem. — jarnbjo, Apr 11 '13 at 10:59
You're right: there's a coordinate space modification according to some layout algorithms. Maybe you will end up writing your own layout engine based on iText or some other PDF library. Would be great if you find any solution to share with us — Guy, Apr 11 '13 at 12:12

score 0 · Answer 3 · answered Mar 17 '13 at 14:44

0

Have you tried http://code.google.com/p/flying-saucer/. It is supposed to convert HTML to PDF.

answered Mar 17 '13 at 14:44

Dev Blanked

8,555
3
26
32

No, I haven't. And since there are IMHO no markup features or style attributes in HTML and CSS to describe the required layout, I don't know how Flying Saucer would help. If you know how, please tell. – jarnbjo Mar 18 '13 at 15:47
You can use a normal HTML table and set the width/height of columns and rows to appropriate px values. If you can get the needed structure in a web page using JSP probably flying saucer can generate the required pdf. – Dev Blanked Mar 19 '13 at 07:46
Can you please elaborate. It is not obvious to me, how setting the column width can force page breaks between columns, unless Flying Saucer implements some extra-CSS magic, which it according to my tests does not. If I just specify column widths, which added together exceeds the page width, Flying Saucer either shrinks the columns to make them fit on one page, or if the cell content is too wide, the last fitting column is simply cropped at the right page border. – jarnbjo Mar 19 '13 at 13:23

score 0 · Answer 4 · answered Mar 18 '13 at 10:45

0

My advice is to use FOP transformer.

Here you can see some examples and how to use it.

Here you can find some examples with fop and tables.

answered Mar 18 '13 at 10:45

Jordan Borisov

1,603
6
34
69

I already wrote in the question why I haven't tried Apache FOP. If you know how to achieve the required layout with XSL-FO, that would suffice, but the examples you are linking to have nothing to do with my actual problem. – jarnbjo Mar 18 '13 at 15:45
+1 for Apache FOP. I used it for generate rich pdf reports, with complex grid and a lot of images (of course with requirements for strongly keep some elements together on one page, etc). – iMysak Mar 20 '13 at 11:33
Apache FOP is not a bad solution if its functionality fulfills your requirements, you are not affected by any of the many bugs and can live with the rather poor performance. Unless I can use FOP to create the layout I am specifically asking for, it is however not an answer to my question. – jarnbjo Mar 20 '13 at 12:42

score 0 · Answer 5 · answered Apr 10 '13 at 06:24

Jasper has no support.

According to the Jasper documentation it does have support, via:

column break element (i.e. a break element with a type=column attribute). This can be placed at any location in a report.
isStartNewColumn attribute on groups/headers

See http://books.google.com.au/books?id=LWTbssKt6MUC&pg=PA165&lpg=PA165&dq=jasper+reports+%22column+break%22&source=bl&ots=aSKZfqgHR5&sig=KlH4_OiLP-cNsBPGJ7yzWPYgH_k&hl=en&sa=X&ei=h_1kUb6YO6uhiAeNk4GYCw&redir_esc=y#v=onepage&q=column%20break&f=false

If you're really stuck, as a last resort you could use Excel / OpenOffice Calc: manually copy data into cells, manually format it as you desire, save as xls format. Then use apache POI from java to dynamically populate/replace the desired data & print to file/PDF. At least it gives very fine-grained control of column & row formatting/breaks/margins etc.

The column breaks which are mentioned in the search result you are linking to seem all to refer to text columns and not table columns. If I'm missing something, please clarify. Using Excel or OpenOffice is an possibility, but we would then need to remote control either of the products. Server policies unfortunately does not allow to install such software and even if we did, remoting both MS Office and Open Office is a real hassle (been there, done that). — jarnbjo, Apr 11 '13 at 09:13

Tables in PDF with horizontal page breaks

5 Answers5