I am trying to use PyPDF2 to grab the number of pages of every pdf in a directory. I can use .getNumPages() to find the number of pages in one pdf file but I need to walk through a directory and get the number of pages for every file. Any ideas?
Here is the code I have so far:
import pandas as pd
import os
from PyPDF2 import PdfFileReader
df = pd.DataFrame(columns=['fileName', 'fileLocation', 'pageNumber'])
pdf=PdfFileReader(open('path/to/file.pdf','rb'))
for root, dirs, files in os.walk(r'Directory path'):
for file in files:
if file.endswith(".pdf"):
df2 = pd.DataFrame([[file, os.path.join(root,file),pdf.getNumPages()]], columns=['fileName', 'fileLocation', 'pageNumber'])
df = df.append(df2, ignore_index=True)
This code will just add the number of pages from the first PDF file in the directory to the dataframe. If I try to add a directory path to PdfFilereader() I get a
PermissionError:[Errno 13] Permission denied.