3

I am trying to access a PDF file from a bank's website for PDF mining, but it keeps returning HTTP 403 error. So as a workaround, I am trying to change my User-Agent to a browser for accessing the file (and downloading it).

The code below is part of what I have right now. This returns the following error:

C:\Users\Name\Anaconda3\lib\site-packages\ipykernel_launcher.py:8: DeprecationWarning: MyOpener style of invoking requests is deprecated. Use newer urlopen functions/methods

How do I fix this?

import urllib.request

my_url = 'someurl here'

class MyOpener(urllib.request.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) 
Gecko/20071127 Firefox/2.0.0.11'

myopener = MyOpener()

page = myopener.open(my_url)
page.read()
bullfighter
  • 397
  • 1
  • 4
  • 21

2 Answers2

4

You can try this:

import urllib2

def download_file(download_url):
    response = urllib2.urlopen(download_url)
    f = open("the_downloaded_file.pdf", 'wb')
    f.write(response.read())
    f.close()

download_file("some url to pdf here")
Vaibhav
  • 484
  • 4
  • 7
  • Thanks for the tip! I managed to get it working with the 'requests' library instead however. – bullfighter Jan 18 '19 at 22:02
  • For me, `urllib2` was giving a squiggly line in my VS Code IDE I [changed `urllib2` to `urllib` and it worked](https://stackoverflow.com/a/74883483/569302), based off [this answer](https://stackoverflow.com/a/54261548/569302) – Jesus is Lord Dec 22 '22 at 01:57
  • @bullfighter Don't update your *question* with the answer, but you can post your own answer. – CrazyChucky Dec 22 '22 at 02:15
  • `urllib2` only exists in Python 2. [Modern Python 3 simply has `urllib`](https://stackoverflow.com/questions/2018026/what-are-the-differences-between-the-urllib-urllib2-urllib3-and-requests-modul). – CrazyChucky Dec 22 '22 at 02:18
  • Neither urllib2 nor urllib works in base Python 3 – Łukasz Nojek Feb 06 '23 at 22:47
0

For me, urllib2 was giving a squiggly line in my VS Code IDE

I changed urllib2 to urllib and it worked, based off this answer

from urllib.request import urlopen

def http_get_save(url, encoded_path):
    with urlopen(url) as response:
        body = response.read().decode()
        with open(encoded_path, 'w') as f:
            f.write(body)
Jesus is Lord
  • 14,971
  • 11
  • 66
  • 97