3

I'm trying to download automatically the first image which appears in the google image search but I'm not able to read the website source and an error occurs ("HTTP Error 403: Forbidden"). Any ideas? Thank you for your help!

That's my code:

from urllib.request import urlopen
from bs4 import BeautifulSoup

word = 'house'
r = urlopen('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word)
data = r.read()
Wojciech
  • 33
  • 1
  • 3

1 Answers1

7

Apparently you have to pass the headers argument because the website is blocking you thinking you are a bot requesting data. I found an example of doing this here HTTP error 403 in Python 3 Web Scraping.

Also, the urlopen object didn't support the headers argument, so I had to use the Request object instead.

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup

word = 'house'
r = Request('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word, headers={'User-Agent': 'Mozilla/5.0'})
response = urlopen(r).read()
Alex F
  • 2,086
  • 4
  • 29
  • 67