2

I want to extract the title of each page of PDF, but my pdfs does not have similar or predefine size of title (title size is varying in every page), I tried following code, but its not giving me the expected output, instead its extracting whole text of that page

import PyPDF2
from PyPDF2 import PdfFileReader, PdfFileWriter

filenames = ['Test2.pdf']
# filenames = ['sample-pdf-download-10-mb.pdf', 'sample-pdf-file.pdf', 'sample-pdf-with-images.pdf']
pdf_Writer = PdfFileWriter()

for filename in filenames:
    pdfFileObj = open(filename, 'rb')
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    num_pages = pdfReader.numPages
    count = 0
    text = ""

    while count < num_pages:
        pageObj = pdfReader.getPage(count)
        count += 1
        text += pageObj.extractText()
        print(count, "= ", pageObj.extractText().title())

Also how can I extract highlighted text from PDF?

molbdnilo
  • 64,751
  • 3
  • 43
  • 82

0 Answers0