Python 3, urlopen - HTTP Error 403: Forbidden

Question

I'm trying to download automatically the first image which appears in the google image search but I'm not able to read the website source and an error occurs ("HTTP Error 403: Forbidden"). Any ideas? Thank you for your help!

That's my code:

from urllib.request import urlopen
from bs4 import BeautifulSoup

word = 'house'
r = urlopen('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word)
data = r.read()

Maybe Google doesn't like the default user agent sent by the `urlopen()` — Michael Butscher, Dec 01 '17 at 13:24
Possible duplicate https://stackoverflow.com/questions/16627227/http-error-403-in-python-3-web-scraping — ababuji, Dec 01 '17 at 13:25

score 7 · Accepted Answer · answered Dec 01 '17 at 13:26

Apparently you have to pass the headers argument because the website is blocking you thinking you are a bot requesting data. I found an example of doing this here HTTP error 403 in Python 3 Web Scraping.

Also, the urlopen object didn't support the headers argument, so I had to use the Request object instead.

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup

word = 'house'
r = Request('https://www.google.pl/search?&dcr=0&tbm=isch&q='+word, headers={'User-Agent': 'Mozilla/5.0'})
response = urlopen(r).read()

Thank you @Alex, it solves my problem! Now I'm able to read the website source :) — Wojciech, Dec 03 '17 at 13:32

Python 3, urlopen - HTTP Error 403: Forbidden

1 Answers1