I'm trying to extract the text from the pdf url.
If I download the PDF I can easily extract the text with the function slate
. However, when trying to import the pdf with io
and extract the text, the output returned is just nothing. The code in attached below.
import requests, PyPDF2, io
from io import BytesIO
url = 'https://www.poderjudicial.es/search/contenidos.action?action=accessToPDF&publicinterface=true&tab=AN&reference=e3ca421447bc6b71&encode=true&optimize=20210216&databasematch=AN'
response = requests.get(url)
f = io.BytesIO(response.content)
with f as data:
read_pdf = PyPDF2.PdfFileReader(data)
page = read_pdf.getPage(1)
print(page.extractText())
I have tried a bunch of other functions but is not working. Am I doing something wrong?