29

This is the module that I'm trying to use and there is a form I'm trying to fill automatically. The reason I'd like to use Requests over Mechanize is because with Mechanize, I have to load the login page first before I can fill it out and submit, whereas with Requests, I can skip the loading stage and go straight to POSTing the message (hopefully). Basically, I'm trying to make the login process consume as little bandwidth as possible.

My second question is, after the login process and the redirection, is it possible to not fully download the whole page, but to only retrieve the page title? Basically, the title alone will tell me if the login succeeded or not, so I want to minimize bandwidth usage.

I'm kind of a noob when it comes to HTTP requests and whatnot, so any help would be appreciated. FYI, this is for a school project.

edit The first part of the question has been answered. My question now is for the second part

Jeremy
  • 1
  • 85
  • 340
  • 366
Display Name
  • 947
  • 2
  • 10
  • 18
  • 1
    You can use the Chrome inspector to see what values are getting passed in to the post request created by the browser and then go from there. – bossylobster Oct 30 '12 at 21:41

2 Answers2

43

Some example code:

import requests

URL = 'https://www.yourlibrary.ca/account/index.cfm'
payload = {
    'barcode': 'your user name/login',
    'telephone_primary': 'your password',
    'persistent': '1'  # remember me
}

session = requests.session()
r = requests.post(URL, data=payload)
print r.cookies

The first step is to look at your source page and identify the form element that is being submitted (use Firebug/Chrome/IE tools whatever (or just looking at the source)). Then find the input elements and identify the required name attributes (see above).

The URL you provided happens to have a "Remember Me", which although I haven't tried (because I can't), implies it'll issue a cookie for a period of time to avoid further logins -- that cookies is kept in the request.session.

Then just use session.get(someurl, ...) to retrieve pages etc...

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • I tried that but it didn't seem to authenticate me, although it worked using Mechanize. Do you know could be wrong? **edit** Sorry, actually it worked. I just made a typo :) – Display Name Oct 30 '12 at 22:04
  • You're a lifesaver. I thought I'd have to wade through coldfusion crap all day. Ended up taking 15 minutes to do 8 hours of manual downloading! – Blairg23 Dec 21 '15 at 22:58
  • so how do i send a file along ? – Yash Kumar Verma Mar 19 '17 at 13:51
  • ```session = requests.session() r = session.post(URL, data=payload)``` you declared `session` – LeonF Sep 04 '20 at 19:33
17

In order to use authentication within a requests get or post function you just supply the auth argument. Like this:

response = requests.get(url, auth = ('username', 'password')) Refer to the Requests Authentication Documentation for more detailed info.

Using Chrome's developer tools you can inspect the elements of your html page that contains the form that you would like to fill out and submit. For an explanation of how this is done go here. You can find the data that you need to populate your post request's data argument. If you are not worried about verifying the security certificate of the site you are accessing then you can also specify that in the get argument list.

If your html page has these elements to use for your web form posting:

<textarea id="text" class="wikitext" name="text" cols="80" rows="20">
This is where your edited text will go
</textarea>
<input type="submit" id="save" name="save" value="Submit changes">

Then the python code to post to this form is as follows:

import requests
from bs4 import BeautifulSoup

url = "http://www.someurl.com"

username = "your_username"
password = "your_password"

response = requests.get(url, auth=(username, password), verify=False)

# Getting the text of the page from the response data       
page = BeautifulSoup(response.text)

# Finding the text contained in a specific element, for instance, the 
# textarea element that contains the area where you would write a forum post
txt = page.find('textarea', id="text").string

# Finding the value of a specific attribute with name = "version" and 
# extracting the contents of the value attribute
tag = page.find('input', attrs = {'name':'version'})
ver = tag['value']

# Changing the text to whatever you want
txt = "Your text here, this will be what is written to the textarea for the post"

# construct the POST request
form_data = {
    'save' : 'Submit changes'
    'text' : txt
} 

post = requests.post(url,auth=(username, password),data=form_data,verify=False)
Community
  • 1
  • 1
Moot
  • 321
  • 4
  • 12