Fill form values in a web page via a Python script (not testing)

Question

I need to fill form values on a target page then click a button via Python. I've looked at Selenium and Windmill, but these are testing frameworks - I'm not testing. I'm trying to log into a 3rd party website programatically, then download and parse a file we need to insert into our database. The problem with the testing frameworks is that they launch instances of browsers; I just want a script I can schedule to run daily to retrieve the page I want. Any way to do this?

score 33 · Answer 1 · answered Oct 12 '09 at 15:31

33

You are looking for Mechanize

Form submitting sample:

import re
from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__)
response = br.submit()  # submit current form

answered Oct 12 '09 at 15:31

Vinko Vrsalovic

330,807
53
334
373

I'm stuck using Python 2.6 though, so sadly Mechanize isn't an option either. (GopherError dropped in 2.6, looks like). – Habaabiai Oct 12 '09 at 15:35
Mechanize doc is usually a bit terse, but it works really really great ! – Philippe F Oct 12 '09 at 15:35
I think you should insist, try debugging the gopher problem. In python 2.6, gopher support was removed IIRC, so fixing your problem is probably about commenting some import gopherlib and the few spots where gopher is actually used. – Philippe F Oct 12 '09 at 15:38
@Habaabiai: Mechanize advertises working in 2.6, you could ask a question about your problem with it. Also, you can try urllib2 (which will force you to write more code to submit a form.) – Vinko Vrsalovic Oct 12 '09 at 15:40
It seems Mechanize does not support python 3 (yet...?), I guess it means it is not maintained anymore (it is the first FAQ at http://wwwsearch.sourceforge.net/mechanize/faq.html). – DarkLight May 06 '20 at 06:43

RATHI · Answer 2 · 2017-08-18T10:41:29.630

Have a look on this example which use Mechanize: it will give the basic idea:

#!/usr/bin/python
import re 
from mechanize import Browser
br = Browser()

# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]

# Retrieve the Google home page, saving the response
br.open( "http://google.com" )

# Select the search box and search for 'foo'
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'

# Get the search results
br.submit()

# Find the link to foofighters.com; why did we run a search?
resp = None
for link in br.links():
    siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
    if siteMatch:
        resp = br.follow_link( link )
        break

# Print the site
content = resp.get_data()
print content

Clueless · Answer 3 · 2009-10-12T15:55:19.677

8

You can use the standard urllib library to do this like so:

import urllib

urllib.urlretrieve("http://www.google.com/", "somefile.html", lambda x,y,z:0, urllib.urlencode({"username": "xxx", "password": "pass"}))

edited Oct 12 '09 at 15:55

answered Oct 12 '09 at 15:48

Clueless

3,984
1
20
27

Abhranil Das · Answer 4 · 2011-04-16T09:59:33.760

4

The Mechanize example as suggested seems to work. In input fields where you must enter text, use something like:

br["kw"] = "rowling"  # (the method here is __setitem__)

If some content is generated after you submit the form, as in a search engine, you get it via:

print response.read()

edited Apr 16 '11 at 09:59

answered Apr 16 '11 at 09:32

Abhranil Das

5,702
6
35
42

score 4 · Answer 5 · answered Aug 31 '19 at 05:32

4

For checkboxes, use 1 & 0 as true & false respectively:

br["checkboxname"] = 1 #checked = true
br["checkboxname2"] = 0 #checked = false

answered Aug 31 '19 at 05:32

Ritesh Khandekar

3,885
3
15
30

Fill form values in a web page via a Python script (not testing)

5 Answers5

Linked