1

I need to access the html source code of a webpage

but it requires authentication how can I pass my username password and get it using python
the problem is for e.g. if I do this:

import requests
url='http://cuherp.chalkpad.in//Interface/index.php'
url_in='http://cuherp.chalkpad.in//Interface/Student/scIndex.php'
u='b1300*****'
p='jang******'
params={'Username':u,
        'Password':p,
        'Institute':'CSOET',
        'Session':'2013-14'}
resp_1=requests.get(url,auth=(u,p))
resp_2=requests.get(url_in,auth=(u,p),cookies=resp_1.cookies)

here "url" is the login page and "url_in" is the one I need the html code for
but after running this "url_in.url" returns "url" itself i.e the login page which means i am still not inside please help

Community
  • 1
  • 1
mojozinc
  • 164
  • 8

1 Answers1

0

A very convenient way in my opinion is to use selenium webdriver to remote control your browser for this task. Some may say it is an overkill to use a whole testing framework for this purpose, but it is as simple as shown here: how-to-submit-http-authentication-with-selenium-python-binding-webdriver

If you prefer to stick to requests you might want to use Robobrowser, a new framework built on top of it integrating mechanize for filling forms (here an example from the docs):

from robobrowser import RoboBrowser

browser = RoboBrowser()
browser.open('http://twitter.com')

# Get the signup form
signup_form = browser.get_form(class_='signup')
signup_form         # <RoboForm user[name]=, user[email]=, ...

# Inspect its values
signup_form['authenticity_token'].value     # 6d03597 ...

# Fill it out
signup_form['user[name]'].value = 'python-robot'
signup_form['user[user_password]'].value = 'secret'

# Serialize it to JSON
signup_form.serialize()         # {'data': {'authenticity_token': '6d03597...',
                                #  'context': '',
                                #  'user[email]': '',
                                #  'user[name]': 'python-robot',
                                #  'user[user_password]': ''}}

# And submit
browser.submit_form(signup_form)

Beautiful soup is also included in Robobrowser, so you can imediately start parsing the source after login

Community
  • 1
  • 1
barrios
  • 1,104
  • 1
  • 12
  • 21