2

I'm trying to find all the information inside "inspect" when using a browser for example chrome, currently i can get the page "source" but it doesn't contain everything that inspect contains

when i tried using

    with urllib.request.urlopen(section_url) as url:
    html = url.read()

I got the following error message: "urllib.error.HTTPError: HTTP Error 403: Forbidden"

Now I'm assuming this is because the url I'm trying to get is from a https url instead of a http one, and i was wondering if there is a specific way to get that information from https since the normal methods arn't working.

Note: I've also tried this, but it didn't show me everything

f = requests.get(url)
print(f.text)
  • "Inspect" just shows you where in the source a certain thing is. To implement your own, you would need to make a GUI (your own browser). – Frogboxe Jan 28 '17 at 18:20
  • Also, 403 means that the site refused to send data back. perhaps you don't have access rights. – Frogboxe Jan 28 '17 at 18:22

1 Answers1

1

You need to have a user-agent to make the browser think you're not a robot.

import urllib.request, urllib.error, urllib.parse

url = 'http://www.google.com' #Input your url
user_agent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3'
headers = { 'User-Agent' : user_agent }
req = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(req)
html = response.read()
response.close()

adapted from https://stackoverflow.com/a/3949760/6622817

Community
  • 1
  • 1
Taku
  • 31,927
  • 11
  • 74
  • 85