I'm trying to scrape the text from the pdf file on https://www.blackhawk.edu/Portals/0/Public%20PDFs/2019-20/Blackhawk-Staged-Reopening-Plan-2.pdf?ver=2020-07-09-171645-080 I tried the following code, but it failed.
import PyPDF2
url="https://www.blackhawk.edu/Portals/0/Public%20PDFs/2019-20/Blackhawk-Staged-Reopening-Plan-2.pdf?ver=2020-07-09-171645-080"
pdf=requests.get(url).content
with open("my_pdf.pdf", 'wb') as my_data:
my_data.write(pdf)
open_pdf_file = open("my_pdf.pdf", 'rb')
read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
n=read_pdf.getNumPages()
temp=read_pdf.flattenedPages #make a list
temp2=[d.extractText() for d in temp]
temp2="".join(temp2)
temp2=ext_context(temp2,type="pdf")
print(temp2)
Only some empty circles were scraped but not the text I need. I am new to Python. Any help is appreciated. Thank you for your time in advance.