1

I have a PDF document which might have been created by extracting few pages from another PDF document. I am wondering How do I get the page number. As the starting page number is 572, which for a complete PDF document should have been 1.

Do you think converting the PDF into an XMl will sort this issue?

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
Deb
  • 56
  • 5

2 Answers2

1

Most probably the document contains /PageLabels entry in the Document Catalog. This entry specifies the numbering style for page numbers and the starting number, too.

You might have to update the starting number or remove the entry completely. The following document contains more information about /PageLabels entry:

The example 2 in the document might be useful if you decide to update the entry.

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
  • Thanks for the response. But I am actually trying to access it from within my Java program. Is there a way to get a handle to it from PDFReader object? – Deb May 31 '13 at 17:43
1

Finally figured it out using iText. Would not have been possible without Bovrosky's hint. Tons of thanks to him. Posting the code sample:

public void process(PdfReader reader) {
    PRIndirectReference obj = (PRIndirectReference) dict.get(com.itextpdf.text.pdf.PdfName.PAGELABELS);
    System.out.println(obj.getNumber());
    PdfObject ref = reader.getPdfObject(obj.getNumber());
    PdfArray array = (PdfArray)((PdfDictionary) ref).get(com.itextpdf.text.pdf.PdfName.NUMS);
    System.out.println("Start Page: " + resolvePdfIndirectReference(array, reader));
}

private static int resolvePdfIndirectReference(PdfObject obj, PdfReader reader) {
    if (obj instanceof PdfArray) {
        PdfDictionary subDict = null;
        PdfIndirectReference indRef = null;
        ListIterator < PdfObject > itr = ((PdfArray) obj).listIterator();
        while (itr.hasNext()) {
            PdfObject pdfObj = itr.next();
            if (pdfObj instanceof PdfIndirectReference)
                indRef = (PdfIndirectReference) pdfObj;
            if (pdfObj instanceof PdfDictionary) {
                subDict = (PdfDictionary) pdfObj;
                break;
            }
        }
        if (subDict != null) {
            return resolvePdfIndirectReference(subDict, reader);
        } else if (indRef != null)
            return resolvePdfIndirectReference(indRef, reader);
    } else if (obj instanceof PdfIndirectReference) {
        PdfObject ref = reader.getPdfObject(((PdfIndirectReference) obj).getNumber());
        return resolvePdfIndirectReference(ref, reader);
    } else if (obj instanceof PdfDictionary) {
        PdfNumber num = (PdfNumber)((PdfDictionary) obj).get(com.itextpdf.text.pdf.PdfName.ST);
        return num.intValue();
    }
    return 0;
}
Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161
Deb
  • 56
  • 5