-1

I'm using Python 2.7.5 to try and log in to a website. I need to log in to this site, and then navigate to several other pages to extract tables from them. For now though, my problem lies with simply logging in to the site. The for the login page looks like this:

<form action="/session" class="text" method="post"><div style="margin:0;padding:0;display:inline"><input name="authenticity_token" type="hidden" value="xeSbOkcWd444xhHyLj82wLS62qfH72De+7lwIhWFRd4=" /></div>    <p>
    <label for="login">Username</label><br />
    <input id="login" name="login" type="text" /><br />
    <label for="password">Password</label><br/>
    <input id="password" name="password" type="password" />
    <a href="/forgot_password">(Forgotten your password?)</a>
</p>

<p>
    <input id="remember_me" name="remember_me" type="checkbox" value="1" />
    <label class="shiftedlabel" for="remember_me">Remember me</label>
</p>

<p>
    <br /><input name="commit" type="submit" value="Log in" />
</p>
</form>

I have been using cookiejar, urllib and urllib2, in the following code, which I got from this previous question, which I have modified slightly below:

import urllib, urllib2, cookielib

username = 'namehere'
password = 'passwordhere'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'Username' : username, 'password' : password, 'Remember_me' : "1", 'commit' : 'Log in'})
opener.open('http://example.org/login.php', login_data)
resp = opener.open('http://example.org/password_protected_page')
print resp.read()

I have added two fields to the original "login data", remember me and submit.

When I run this code, I get a printout of the pass worded page, but it has the error that I must be logged in to see this page, and cannot see the table I need to. Please note that a .php extention does not exist for this page on the website, I don't know how much of a difference that makes though.

On a related note, the other most common solution I found for this type of thing was to use the mechanize module. I however was unable to install the "easy installer" tool it uses to install itself, and as I'm fairly new to this I wasn't able to diagnose the problem. That's a separate issue though.

Thanks for any help :)

Community
  • 1
  • 1
ElvinDrude
  • 13
  • 4
  • 1
    You should ***seriously*** consider looking into [**mechanize**](http://wwwsearch.sourceforge.net/mechanize/) for this sort of thing. – jedwards Jun 19 '13 at 23:33
  • Sorry, I realised I forgot to mention I had looked at mechanize, but was unable to install it, as I couldn't get the installer program to work in the first place. – ElvinDrude Jun 19 '13 at 23:39
  • Shouldn't "Remember me" in your urlencode actually be "remember_me"? – Dan Doe Jun 19 '13 at 23:43
  • Unfortunately this did not solve the problem. Editing it into the original though. – ElvinDrude Jun 19 '13 at 23:44

1 Answers1

2

I'd suggest checking out the program Charles. It's great for finding the data that is sent to the server, it's generally pretty straightforward to emulate that same request with urllib afterwards.

In your case it looks like you aren't adding the value of authenticity_token to your POST, the name "Remember me" is actually "remember_me", and the name "Username" is actually "login".

Dan Doe
  • 1,146
  • 3
  • 14
  • 25
  • Unfortunately changing to "login" and "remember_me" is not working. I'll checkout that program and see if that sheds and light on the issue. – ElvinDrude Jun 19 '13 at 23:51
  • 2
    You still need to add in authenticity_token. :) – Dan Doe Jun 19 '13 at 23:52
  • Ah, sorry, I missed that. I also don't know what this key actually is, and upon refreshing the source code in my browser I see that it changes upon each refresh. How do I go about adding it to the code? – ElvinDrude Jun 20 '13 at 00:00
  • 1
    You need to scrape it from the webpage. Consider using a regular expression. – Dan Doe Jun 20 '13 at 00:03