Using Python and Mechanize to submit form data and authenticate

Question

I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site.

import mechanize
import cookielib


def main():

#Browser
br = mechanize.Browser()


# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

#Opens the site to be navigated
r= br.open('http://www.reddit.com')
html = r.read()

# Select the second (index one) form
br.select_form(nr=1)

# User credentials
br.form['user'] = 'DUMMYUSERNAME'
br.form['passwd'] = 'DUMMYPASSWORD'

# Login
br.submit()

#Open up comment page
r= br.open('http://www.reddit.com/r/PoopSandwiches/comments/f47f8/testing/')
html = r.read()

#Text box is the 8th form on the page (which, I believe, is the text area)
br.select_form(nr=7)

#Change 'text' value to a testing string
br.form['text']= "this is an automated test"

#Submit the information  
br.submit()

What's wrong with this?

Try adding a sleep of at least 10 seconds. You should also inspect (not 'View Source', but 'Inspect Element' in Chrome or similar in FF) the form in your browser and compare to the downloaded HTML. It might have fields dynamically filled by JS. — TryPyPy, Jan 18 '11 at 06:34
Hmm, let me try to add sleep. I'm not sure how to use API as there is no documentation for submitting comments. — Parseltongue, Jan 18 '11 at 07:25

score 19 · Accepted Answer · answered Jan 18 '11 at 17:59

I would definitely suggest trying to use the API if possible, but this works for me (not for your example post, which has been deleted, but for any active one):

#!/usr/bin/env python

import mechanize
import cookielib
import urllib
import logging
import sys

def main():

    br = mechanize.Browser()
    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    br.set_handle_equiv(True)
    br.set_handle_gzip(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)

    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

    r= br.open('http://www.reddit.com')

    # Select the second (index one) form
    br.select_form(nr=1)

    # User credentials
    br.form['user'] = 'user'
    br.form['passwd'] = 'passwd'

    # Login
    br.submit()

    # Open up comment page
    posting = 'http://www.reddit.com/r/PoopSandwiches/comments/f47f8/testing/'
    rval = 'PoopSandwiches'
    # you can get the rval in other ways, but this will work for testing

    r = br.open(posting)

    # You need the 'uh' value from the first form
    br.select_form(nr=0)
    uh = br.form['uh']

    br.select_form(nr=7)
    thing_id = br.form['thing_id']
    id = '#' + br.form.attrs['id']
    # The id that gets posted is the form id with a '#' prepended.

    data = {'uh':uh, 'thing_id':thing_id, 'id':id, 'renderstyle':'html', 'r':rval, 'text':"Your text here!"}
    new_data_dict = dict((k, urllib.quote(v).replace('%20', '+')) for k, v in data.iteritems())

    # not sure if the replace needs to happen, I did it anyway
    new_data = 'thing_id=%(thing_id)s&text=%(text)s&id=%(id)s&r=%(r)s&uh=%(uh)s&renderstyle=%(renderstyle)s' %(new_data_dict)

    # not sure which of these headers are really needed, but it works with all
    # of them, so why not just include them.
    req = mechanize.Request('http://www.reddit.com/api/comment', new_data)
    req.add_header('Referer', posting)
    req.add_header('Accept', ' application/json, text/javascript, */*')
    req.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
    req.add_header('X-Requested-With', 'XMLHttpRequest')
    cj.add_cookie_header(req)
    res = mechanize.urlopen(req)

main()

It would be interesting to turn javascript off and see how the reddit comments are handled then. Right now there is a bunch of magic that happens in an onsubmit function called when making your post. This is where the uh and id value get added.

Wow. Thank you so much. I would have never figured that out. — Parseltongue, Jan 18 '11 at 19:09
Hmm... I'm getting this error on all active threads: ControlNotFoundError: no control matching name 'thing_id.' Any ideas? — Parseltongue, Jan 18 '11 at 19:36
Haha, no. You misinterpreted that sentence-- no matter which active thread I use this program on, it still triggers the error. The program I'm trying to make is for my own purposes. It posts relevant book chapters to a private subreddit I moderate. — Parseltongue, Jan 18 '11 at 23:54
Problem solved-- it was the [8]th form that contained thing_id. Thank you very much. — Parseltongue, Jan 19 '11 at 03:05
Hmmm... looks like thing_id is in different forms for different subreddits (an interesting problem!) Additionally, selecting a form with the wrong thing_id will post a response to somebody, rather than a new comment. — Parseltongue, Jan 19 '11 at 04:20
FWIW it seems to be form #12 at this point on the open-source reddit at least — dkuebric, Dec 16 '11 at 02:15
There is a cleaner way to prep your data for the url: `urllib.quote(string[, safe])` (http://stackoverflow.com/questions/1695183/how-to-percent-encode-url-parameters-in-python) — phyatt, Jun 01 '13 at 22:58

Using Python and Mechanize to submit form data and authenticate

1 Answers1

Linked