How create a table of contents page in the PDF file from the bookmarks with iText?

Question

I need to create a page in the PDF to the content of table. I will create reading bookmark in PDF.

With iText I use:

tmp = SimpleBookmark.getBookmark (reader);

Testing with this PDF :

Download file PDF

Returns this MAP:

[{Action = GoTo, Named = __ WKANCHOR_2, Title = Secretariat Teste0}, {Action = GoTo, Named = __ WKANCHOR_4, Title = Secretariat TestBook1}, {Action = GoTo, Named = __ WKANCHOR_6, Title = Secretariat Test2}, {Action = GoTo , Named = __ WKANCHOR_8 ...

Without the page number.

How could show one content of table with title and page number?

I would like to show this:

Your question is phrased as a "home work" question. That is: you ask the readers on Stack Overflow to do your work in your place. You can change this question in an *on topic* question by showing what you've tried. I have provided an answer with links to the [official documentation](http://developers.itextpdf.com) and previous answers on Stack Overflow. Now it's up to you to spend some time experimenting with those pointers. If your next comment sounds like "I don't have the time to do that, just give me the code", I predict that you will get many down-votes. — Bruno Lowagie, Aug 19 '16 at 07:25
The link to your PDF isn't a link to a PDF. It's a link to a screen shot. I hope you understand that this is not professional. We can examine the bookmarks in a PDF to find out if they consist of explicit destinations or named destinations. We can't do that if you share a PNG. As a developer you should know that (but maybe you're not a developer). — Bruno Lowagie, Aug 19 '16 at 07:28
Bruno thanks for the comment. I am generating pdf using wkhtmltopdf with it generate the PDF with bookmarks. When I walk the bookmarks (with itext) returns me a list of maps without the page number but with a link to an anchor. My main question is one way to get the page number of each anchor without getting traversing the pdf title search. — Fagner Souza, Aug 19 '16 at 12:27
Link to file PDF. https://drive.google.com/file/d/0BxssB4RX7BnWekdlbnNyUkhuemM/view?usp=sharing — Fagner Souza, Aug 19 '16 at 12:54

score 3 · Accepted Answer · edited May 23 '17 at 10:32

Please read the answer to this question: Java: Reading PDF bookmark names with itext

It explains how you can use the SimpleBookmark method to get the titles of an outline tree (this is how "bookmarks" are called in the PDF specification).

public void inspectPdf(String filename) throws IOException, DocumentException {
    PdfReader reader = new PdfReader(filename);
    List<HashMap<String,Object>> bookmarks = SimpleBookmark.getBookmark(reader);
    for (int i = 0; i < bookmarks.size(); i++){
        showTitle(bookmarks.get(i));
    }
    reader.close();
}

public void showTitle(HashMap<String, Object> bm) {
    System.out.println((String)bm.get("Title"));
    List<HashMap<String,Object>> kids = (List<HashMap<String,Object>>)bm.get("Kids");
    if (kids != null) {
        for (int i = 0; i < kids.size(); i++) {
            showTitle(kids.get(i));
        }
    }
}

Then read the answer to this question: Set inherit Zoom(action property) to bookmark in the pdf file

You'll see that the HashMap<String, Object> doesn't only contain an entry with key "Title", but that it can also contain an entry with key "Page". That is the case when the bookmark points at a page. The value will be an explicit destination. It will consist of the page number, a value such as Fit, FitH, FitB, XYZ, followed by some parameters that mark the position.

If you look at the CreateOutlineTree example, you'll see that you can also extract the bookmarks as an XML file:

public void createXml(String src, String dest) throws IOException {
    PdfReader reader = new PdfReader(src);
    List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
    SimpleBookmark.exportToXML(list,
            new FileOutputStream(dest), "ISO8859-1", true);
    reader.close();
}

This is a screenshot from a book I wrote about iText that shows you which keys you can expect in a bookmark entry:

As you can tell from this table, a link can also be expressed as a named destination. In that case, you won't get the page number, but a name. To get the page number, you need to extract the list of named destinations. This list will get you the explicit destination corresponding with the named destination.

That is also explained in the book, as well as in the official documentation.

Once you have the titles and the page numbers (retrieved with code written based on the above pointers), you can insert pages to the PDF file using PdfStamper and the insertPage() method. You can put the TOC on those pages using ColumnText, or you can create a separate PDF for the TOC and merge it with the original one. See How to add a cover/PDF in a existing iText document to find out more about these two techniques.

You will also benefit from this example: Create Index File(TOC) for merged pdf using itext library in java

As for the dashed line between the title and the page number, that's done using a separator, more specifically a dotted line separator. You should read this question first: iTextSharp - Is it possible to set a different alignment in the same cell for text

Then read this question: How to Generate Table-of-Figures Dot Leaders in a PdfPCell for the Last Line of Wrapped Text (or this question It is possible with itext 5 which at the end of a paragraph justified the remaining space is filled with scripts?)

Note that your question is actually off-topic. It's phrased as a "home work" question. It invites people to do your work in your place. Now that you have all the elements you need, you should be able to do the work yourself. If you don't succeed, you should write an on topic Stack Overflow question. That's a question in which you show what you've tried and explain the technical problem you experience.

Update:

You shared a document with the following outline tree:

As you can see, the bookmarks are defined using Named destinations, such as /__WKANCHOR_2, /__WKANCHOR_4, and so on. As you can tell from the / character, the names are stored as PDF name objects (PDF 1.1), not as PDF string objects (since 1.2). The most recent PDF standards recommend to use PDF string objects instead of PDF name objects, you may want to ask the vendor of your PDF generation software to update the software to meet the recommendations of the most recent PDF standards.

Nevertheless, we can easily get the explicit destinations that correspond with those named destinations. They are stored in the /Dests entry of the root dictionary:

When you look at the way the destinations you see another problem that should be reported to wkhtmltopdf. Let's take a look at what the ISO standard tells us about the syntax to be used for destinations:

The concept of page numbers doesn't exist in PDF. Pages are described using page dictionaries, and the page number is derived from the position of the page in the page tree. The first page that is encountered in the page tree is page 1, the second page that is encountered is page 2, and so on.

In your example, the explication destinations are defined like this: [9/XYZ 30.2400000 524.179999 0], [9/XYZ 30.2400000 231.379999 0], and so on.

This is wrong. The ISO standard says that the first value in the array needs to be an indirect reference. An indirect reference has the format 9 0 R, not 9. I looked at the structure of the document, and I see that wkhtmltopdf uses a page number - 1 instead of an indirect reference. If I look at /__WKANCHOR_2, it refers to [0/XYZ 30.240000 781.459999 0] and that 0 is supposed to point to page 1. As Adobe Reader tolerates crappy software, this works in Adobe Reader, but as the file is in violation with ISO-32000, iText doesn't know what to do with those misleading destinations, at least, the convience class SimpleNamedDEstination doesn't know what to do with it.

Fortunately, iText is a very versatile library that allows you to go deep under the hood of a PDF. In this case, we only have to go one level deeper. Instead of SimpleNamedDestination.getNamedDestination(reader, true), we can use the following approach:

HashMap<String, PdfObject> names = reader.getNamedDestinationFromNames();
for (Map.Entry<String, PdfObject> entry: names.entrySet()) {
    System.out.print(entry.getKey());
    System.out.print(": p");
    PdfArray arr = (PdfArray)entry.getValue();
    System.out.println(arr.getAsNumber(0).intValue() + 1);
}
reader.close();

The output of this method is:

__WKANCHOR_w: p7
__WKANCHOR_y: p7
__WKANCHOR_2: p1
__WKANCHOR_4: p1
__WKANCHOR_16: p9
__WKANCHOR_14: p8
__WKANCHOR_18: p9
__WKANCHOR_1s: p13
__WKANCHOR_a: p2
__WKANCHOR_1q: p13
__WKANCHOR_1o: p12
__WKANCHOR_12: p8
__WKANCHOR_1m: p12
__WKANCHOR_e: p3
__WKANCHOR_10: p7
__WKANCHOR_1k: p12
__WKANCHOR_c: p3
__WKANCHOR_1i: p11
__WKANCHOR_i: p4
__WKANCHOR_8: p2
__WKANCHOR_g: p3
__WKANCHOR_1g: p11
__WKANCHOR_6: p1
__WKANCHOR_1e: p10
__WKANCHOR_m: p5
__WKANCHOR_1c: p10
__WKANCHOR_k: p4
__WKANCHOR_q: p5
__WKANCHOR_1a: p9
__WKANCHOR_o: p5
__WKANCHOR_u: p6
__WKANCHOR_s: p6

If we check __WKANCHOR_2, we see that it correctly points at page 1. I checked the final link in the outlines, it points at the named destination with name __WKANCHOR_1s and indeed: that should link to page 13.

Your problem is a clear example of a "garbage in-garbage out" problem. Your tool produces PDFs that are in violation with the ISO standard for PDF, and as a result you lose plenty of time trying to figure out what's wrong. But what's even worse: you made me lose time because of someone else's fault.

This line: `HashMap map = SimpleNamedDestination.getNamedDestination(reader, false); ` return empty — Fagner Souza, Aug 19 '16 at 13:26
Do you have any property in the bookmark to return the page number. Or the anchor back what page it is? — Fagner Souza, Aug 19 '16 at 13:52
About question do 'getNamedDestination' return empty passing true or false — Fagner Souza, Aug 19 '16 at 14:01
If `getNamedDestination()` returns null on the PDF you shared, you may not be using the correct iText version, because it should return something. Anyhow, it's too late now. I'm going home. You have all the elements to parse the name tree yourself. If `getNamedDestination()` doesn't work you can always get the `/Dests` from the `/Catalog` yourself. — Bruno Lowagie, Aug 19 '16 at 14:04
Now I see, there is no bug in iText. There's a bug in wkhtmltopdf (and Adobe Reader tolerates it). I will update my answer. — Bruno Lowagie, Aug 19 '16 at 15:12
hey can u suggest a way of removing additional table of contents heading from table of contents in java? — Vikram Saini, Feb 13 '18 at 10:59

How create a table of contents page in the PDF file from the bookmarks with iText?

1 Answers1