I wanna save the company announcement of a listco from a pdf url However, the output file of my python code turns out to be empty.
I tried to extract the text from the pdf directly however, those are simplified chinese and even utf-16 cannot completely decode it.
Please help
import requests
from PyPDF2 import PdfFileReader, PdfFileWriter
url_pdf='http://static.sse.com.cn/disclosure/listedinfo/announcement/c/2018-11-15/601318_20181115_1.pdf'
r = requests.get(url_pdf)
fo = open('file_name.pdf','wb')
fo.write(r.content)
fo.close()
with open('file_name.pdf','rb') as file:
pdf=PdfFileReader(file)
info = pdf.getDocumentInfo()
pages=pdf.numPages
print(pdf.getPage(1).extractText())