I'm having a problem with looping through urls from a txt file, in order to get the title of the pdfs. When there is only one URL the code runs with no problems, but when there are more it throws the following error: " raise utils.PdfReadError("Could not read malformed PDF file") PyPDF2.utils.PdfReadError: Could not read malformed PDF file ".
As for the text file, there is one URL per line, no comas, no weird formatting.
Any idea why this could be happening? (apologies if my question is not well formatted, its actually my first one) :)
import io
import requests
from bs4 import BeautifulSoup
from PyPDF2 import PdfFileReader
def extract_info_from_pdf_url():
with open('pdfs.txt') as urls:
for url in urls:
r = requests.get(url)
f = io.BytesIO(r.content)
reader = PdfFileReader(f)
title = reader.getDocumentInfo().title
print(url)
print(title)
extract_info_from_pdf_url()