0

Im trying to download images off a website but I keep getting getting this error:

HTTP Error 403: Forbidden

This is the function I created, to be able to do this:

    def download_images(url,knife):
      '''
      download_images is a function which will extract pictures of the knives in csgo
      url is the list of url which the images will be extracted from
      images of 'knife' will be downloaded
      '''

      page = requests.get(url)

      #Use beautifulsoup to extract the image urls
      soup = BeautifulSoup(page.content, 'html.parser') 

      #Pull all image labels from the website with instances of img_alt
      for img in soup.find_all('img', alt = True):
        #Find the url and labels of the knives
        if knife in img['alt']:
          #Download the images with the correct labels
          urllib.request.urlretrieve(img['src'],'{}.png'.format(img['alt']))
cccnrc
  • 1,195
  • 11
  • 27

1 Answers1

0

You should change the user agent. There are many user agents that one can use. A list of user agents is available here. To make urllib use a different user agent, you should add this code. Additionally, you could use wget and use the option -U and then a user agent string (an example of which is 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4').


Implementing WGET

import os

def download_images(url,knife):
  '''
  download_images is a function which will extract pictures of the knives in csgo
  url is the list of url which the images will be extracted from
  images of 'knife' will be downloaded
  '''

  page = requests.get(url)

  #Use beautifulsoup to extract the image urls
  soup = BeautifulSoup(page.content, 'html.parser') 

  #Pull all image labels from the website with instances of img_alt
  for img in soup.find_all('img', alt = True):
    #Find the url and labels of the knives
    if knife in img['alt']:
      #Download the images with the correct labels
      os.system("wget --convert-links -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' " + knife)
ds_secret
  • 338
  • 3
  • 18
  • I tried changing the user agent by doing the following: page = requests.get(url,headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' }) but that did not seem to work – Hashim Abu Sharkh Aug 09 '19 at 19:03
  • Should I add this line: urllib.request.urlretrieve(img['src'],'{}.png'.format(img['alt'])) after implementing WGET? WGET also seemed to be not working. – Hashim Abu Sharkh Aug 09 '19 at 19:08
  • @HashimAbuSharkh Do you have WGET installed on your computer? – ds_secret Aug 09 '19 at 19:21
  • @HashimAbuSharkh I tried to WGET https://csgostash.com/img/weapons/s/navaja_knife.png, and it did work without giving me a 403. – ds_secret Aug 09 '19 at 19:24
  • Yes, I installed it before using pip3 install wget, but how did you make it manage to work, can you post your code please? – Hashim Abu Sharkh Aug 09 '19 at 22:30
  • @HashimAbuSharkh I did it on command line. Try it on command line first. Also wget on pypi is not the original GNU WGET. Type `wget` onto your command line to see if it is installed. If WGET is installed, type `wget --convert-links -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' https://csgostash.com/img/weapons/s/navaja_knife.png` into command line and see if it downloads. If WGET is not downloaded, you should download it from https://www.gnu.org/software/wget/ and then run the above command. – ds_secret Aug 12 '19 at 16:32