I need to fill form values on a target page then click a button via Python. I've looked at Selenium and Windmill, but these are testing frameworks - I'm not testing. I'm trying to log into a 3rd party website programatically, then download and parse a file we need to insert into our database. The problem with the testing frameworks is that they launch instances of browsers; I just want a script I can schedule to run daily to retrieve the page I want. Any way to do this?
Asked
Active
Viewed 1e+01k times
5 Answers
33
You are looking for Mechanize
Form submitting sample:
import re
from mechanize import Browser
br = Browser()
br.open("http://www.example.com/")
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
response = br.submit() # submit current form

Vinko Vrsalovic
- 330,807
- 53
- 334
- 373
-
I'm stuck using Python 2.6 though, so sadly Mechanize isn't an option either. (GopherError dropped in 2.6, looks like). – Habaabiai Oct 12 '09 at 15:35
-
Mechanize doc is usually a bit terse, but it works really really great ! – Philippe F Oct 12 '09 at 15:35
-
I think you should insist, try debugging the gopher problem. In python 2.6, gopher support was removed IIRC, so fixing your problem is probably about commenting some import gopherlib and the few spots where gopher is actually used. – Philippe F Oct 12 '09 at 15:38
-
@Habaabiai: Mechanize advertises working in 2.6, you could ask a question about your problem with it. Also, you can try urllib2 (which will force you to write more code to submit a form.) – Vinko Vrsalovic Oct 12 '09 at 15:40
-
It seems Mechanize does not support python 3 (yet...?), I guess it means it is not maintained anymore (it is the first FAQ at http://wwwsearch.sourceforge.net/mechanize/faq.html). – DarkLight May 06 '20 at 06:43
20
Have a look on this example which use Mechanize: it will give the basic idea:
#!/usr/bin/python
import re
from mechanize import Browser
br = Browser()
# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]
# Retrieve the Google home page, saving the response
br.open( "http://google.com" )
# Select the search box and search for 'foo'
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'
# Get the search results
br.submit()
# Find the link to foofighters.com; why did we run a search?
resp = None
for link in br.links():
siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
if siteMatch:
resp = br.follow_link( link )
break
# Print the site
content = resp.get_data()
print content

RATHI
- 5,129
- 8
- 39
- 48
8
You can use the standard urllib
library to do this like so:
import urllib
urllib.urlretrieve("http://www.google.com/", "somefile.html", lambda x,y,z:0, urllib.urlencode({"username": "xxx", "password": "pass"}))

Clueless
- 3,984
- 1
- 20
- 27
4
The Mechanize example as suggested seems to work. In input fields where you must enter text, use something like:
br["kw"] = "rowling" # (the method here is __setitem__)
If some content is generated after you submit the form, as in a search engine, you get it via:
print response.read()

Abhranil Das
- 5,702
- 6
- 35
- 42
4
For checkboxes, use 1
& 0
as true
& false
respectively:
br["checkboxname"] = 1 #checked = true
br["checkboxname2"] = 0 #checked = false

Ritesh Khandekar
- 3,885
- 3
- 15
- 30