AttributeError: 'PDFPage' object has no attribute 'extractText'

Question

I am trying to extract the content from a PDF in order to create an excel sheet out of it.

What I tried

import pdfquery 
pdf = pdfquery.PDFQuery('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = pdf.get_page(3)
page_content = page.extractText()
print (page_content)

It throws the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-32-d6b615faa422> in <module>() 
      1 page = pdf.get_page(3)
----> 2 page_content = page.extractText()
      3 print (page_content)

AttributeError: 'PDFPage' object has no attribute 'extractText'

Please let me know a possible solution.

where is your source saying that a PDF page object **should** have a method called `extractText`? — Tadhg McDonald-Jensen, Jun 07 '17 at 02:16
2023 update: [`PyPDF2` is deprecated](https://pypi.org/project/PyPDF2/). **Use `pypdf`**. I'm the maintainer of both libraries — Martin Thoma, Jul 01 '23 at 22:00

score 2 · Answer 1 · edited Jul 01 '23 at 21:59

2

Use PyPDF2 instead of pdfquery

from PyPDF2 import PdfReader

reader = PdfReader('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = reader.pages[3]
print(page.extract_text())

edited Jul 01 '23 at 21:59

Martin Thoma

124,992
159
614
958

answered Nov 16 '18 at 08:36

Tejas Mankar

108
1
7

Berlin Benilo · Answer 2 · 2022-10-11T09:16:46.073

1

I had also face the same issue. This is due to the non updated version of pypdf2 package installed already with other pdf reader dependencies. By reinstalling pypdf2 is resolved my error.

pip uninstall pypdf2
pip install pypdf2

This worked for me

edited Oct 11 '22 at 09:16

answered May 26 '22 at 04:27

Berlin Benilo

472
1
12

score 0 · Answer 3 · answered Jun 27 '23 at 10:20

0

I reinstalled PyPDF2 after uninstalling PyPDF and PyPDF, and the issue was resolved.

pip uninstall PyPDF
pip uninstall PyPDF2
pip install PyPDF2

answered Jun 27 '23 at 10:20

Hammad Zafar Bawara

354
3
9

AttributeError: 'PDFPage' object has no attribute 'extractText'

What I tried

3 Answers3