Xref table not zero-indexed. ID numbers for objects will be corrected. won't continue

Question

I am trying to open a pdf to get the number of pages. I am using PyPDF2.

Here is my code:

def pdfPageReader(file_name):
    try:
        reader = PyPDF2.PdfReader(file_name, strict=True)
        number_of_pages = len(reader.pages)
        print(f"{file_name} = {number_of_pages}")
        return number_of_pages
    except:
        return "1"

But then i run into this error:

PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]

I tried to use strict=True and strict=False, When it is True, it displays this message, and nothing, I waited for 30minutes, but nothing happened. When it is False, it just display nothing, and that's it, just do nothing, if I press ctrl+c on the terminal (cmd, windows 10) then it cancel that open and continues (I run this in a batch of pdf files). Only 1 in the batch got this problem.

My questions are, how do I fix this, or how do I skip this, or how can I cancel this and move on with the other pdf files?

score 30 · Answer 1 · edited May 22 '22 at 20:22

30

If somebody had a similar problem and it even crashed the program with this error message

File "C:\Programy\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1604, in getObject % (indirectReference.idnum, indirectReference.generation, idnum, generation)) PyPDF2.utils.PdfReadError: Expected object ID (14 0) does not match actual (13 0); xref table not zero-indexed.

It helped me to add the strict argument equal to False for my pdf reader

pdf_reader = PdfReader(input_file, strict=False)

edited May 22 '22 at 20:22

Martin Thoma

124,992
159
614
958

answered Jan 30 '20 at 13:20

DovaX

958
11
16

This worked for me but it might be interesting to know why. – Matt Cremeens Apr 13 '21 at 01:07
1

interesting or tedious? – Chris Jul 21 '21 at 13:22

Bill M. · Answer 2 · 2023-07-07T00:30:38.563

For anybody else who may be running into this problem, and found that strict=False didn't help, I was able to solve the problem by just re-saving a new copy of the file in Adobe Acrobat Reader. I just opened the PDF file inside an actual copy of Adobe Acrobat Reader (the plain ol' free version on Windows), did a "Save as...", and gave the file a new name. Then I ran my script again using the newly saved copy of my PDF file.

Apparently, the PDF file I was using, which was generated directly from my scanner, was somehow corrupt, even though I could open and view it just fine in Reader. Making a duplicate copy of the file via re-saving in Acrobat Reader somehow seemed to correct whatever was missing.

Good! It worked for me opening the PDF in Adobe Acrobat, then Save as.... It passed from 900kb to 500kb. Now it works. — Camilo, Apr 20 '23 at 19:53

score 5 · Answer 3 · answered Aug 28 '18 at 09:53

5

I had the same problem and looked for a way to skip it. I am not a programmer but looking at the documentation about warnings there is a piece of code that helps you avoid such hindrance.

Although I wouldn't recomend this as a solution, the piece of code that I used for my purpose is (just copied and pasted it from doc on link)

import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

answered Aug 28 '18 at 09:53

cektek1

61
1
7

this way you are not solving the issue, just hiding it to not see. I just want you to understand – Guilherme May 17 '23 at 11:22

score 3 · Answer 4 · edited May 22 '22 at 20:22

This happens to me when the file was created in a printer / scanner combo that generates PDFs. I could read in the PDF with only a warning though so I read it in, and then rewrote it as a new file. I could append that new one.

from PyPDF2 import PdfMerger, PdfReader, PdfWriter

reader = PdfReader("scanner_generated.pdf", strict=False)
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

with open("fixedPDF.pdf", "wb") as fp:
    writer.write(fp)

merger = PdfMerger()
merger.append("fixedPDF.pdf")

score 0 · Answer 5 · answered Oct 23 '22 at 10:02

I had the exact same problem, and the solutions did help but didn't solve the problem completely, at least the one setting strict=False & resaving the document using Acrobat reader. Anyway, I still got a stream error, but I was able to fix it after using an PDF online repair. I used sejda.com but please be aware that you are uploading your PDF on some website, so make sure there is nothing sensible in there.

Xref table not zero-indexed. ID numbers for objects will be corrected. won't continue

5 Answers5

Linked