I am trying to create a pdf puller from the Australian Stock Exchange website which will allow me to search through all the 'Announcements' made by companies and search for key words in the pdfs of those announcements.
What I have done so far is used the requests library. Below is my code so far:
import requests
url = 'http://www.asx.com.au/asxpdf/20171103/pdf/43nyyw9r820c6r.pdf'
response = requests.get(url)
print(response.content)
However what prints is the following string (I will cut this off as it will be too long):
> b'%PDF-1.5\r%\xe2\xe3\xcf\xd3\r\n5 0 obj\r<</E 212221/H [ 1081 145 ]/L
> 212973/Linearized 1/N 1/O 8/T 212553>>\rendobj\r
> \r\r42 0 obj\r<</DecodeParms <</Columns 5/Predictor 12>>/Encrypt 7 0
> R/Filter /FlateDecode/ID [(\\216\\203\\217T\\n\\f\\236\\345?%\\214t4
> E\\271) (\\216\\203\\217T\\n\\f\\236\\345?%\\214t4 E\\271)]/Index [5
> 38]/Info 3 0 R/Length 86/Prev 212554/Root 6 0 R/Size 43/Type /XRef/W
> [1 3
> 1]>>\rstream\nx\x9ccbd`\x10``b``:\x04"\x19\xab\xc1d-X\xc4\x06D2\xac\x02\xb3\x93\xc0\xe2\x1d
> \x92?\x07,\x1e\t"\xb9T\x80$\xe3\x84\xcb@\x92\xa9m"\x03\x13\xe3\xdf\x13Z`Y\x06\xc6\x01#\xff3\xb0h\xbcfb`\xb6\x12\x02\xba\xe4\xef!S\x06\x0
I have searched stackexchange and other websites for a few days, and have tried to use print(response.content.decode('utf-8')
as well as ascii but neither of them amount to anything I can read.
Apologies as I know it is obvious that I am a noobie, but any help would be greatly appreciated!
Thanks a lot.