3

I am trying to log into a forum using python requests. This is the forum I'm trying to log into: http://fans.heat.nba.com/community/

Here's my code:

import requests
import sys

URL = "http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login"

def main():
    session = requests.Session()

    # This is the form data that the page sends when logging in
    login_data = {
        'ips_username': 'username',
        'ips_password': 'password',
        'signin_options': 'submit',
        'redirect':'index.php?'
    }

    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    q = session.get('http://fans.heat.nba.com/community/index.php?app=members&module=messaging&section=view&do=showConversation&topicID=4314&st=20#msg26627')
    print(session.cookies)
    print(r.status_code)
    print(q.status_code)

if __name__ == '__main__':
    main()

The URL is the login page on the forums. With the 'q' variable, the session tries to access a certain webpage on the forums (private messenger) that can only be accessed if you're logged in. However, the status code for that request returns '403', which means that I was unable to log in successfully.

Why am I unable to log in? In the 'login_data', 'ips_username' and 'ips_password' are the HTML forms. However, I believe I have the actual log-in commands ('signin_options','redirect') wrong.

Can somebody guide me to the correct log-in commands please?

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
Carnageta
  • 321
  • 1
  • 3
  • 15

2 Answers2

5

There are a hidden input in the form auth_key

<input type='hidden' name='auth_key' value='880ea6a14ea49e853634fbdc5015a024' />

So you need to parse it and pass it to the login page. You could simply use regex

def main():
      session = requests.Session()

      # Get the source page that contain the auth_key
      r = requests.get("http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login")
      # Parse it
      auth_key = re.findall("auth_key' value='(.*?)'",r.text)[0]


      # This is the form data that the page sends when logging in
      login_data = {
           'ips_username': 'username',
           'ips_password': 'password',
           'auth_key' : auth_key                                                                                                                      

      }

And the rest should be the same.

Chaker
  • 1,197
  • 9
  • 22
4

As indicated by @Chaker in the comments, the login form requires you to send an auth_key that you need to read from an initial visit to a page first.

The auth_key is a hidden form field with a random value (generated and stored by the server), so every regular web browser sends that with the POST request. The server then validates the request and requires it to contain an auth_key that it knows is valid (by checking against its list of issued auth_keys). So the process needs to be as follows:

  • Visit the front page (or any page below that probably)
  • Read the value of the auth_key hidden field
  • Create a POST request that includes your credentials and that auth_key

So this seems to work:

import re
import requests

USERNAME = 'username'
PASSWORD = 'password'

AUTH_KEY = re.compile(r"<input type='hidden' name='auth_key' value='(.*?)' \/>")

BASE_URL = 'http://fans.heat.nba.com/community/'
LOGIN_URL = BASE_URL + '/index.php?app=core&module=global&section=login&do=process'
SETTINGS_URL = BASE_URL + 'index.php?app=core&module=usercp'

payload = {
    'ips_username': USERNAME,
    'ips_password': PASSWORD,
    'rememberMe': '1',
    'referer': 'http://fans.heat.nba.com/community/',
}

with requests.session() as session:
    response = session.get(BASE_URL)
    auth_key = AUTH_KEY.search(response.text).group(1)
    payload['auth_key'] = auth_key
    print("auth_key: %s" % auth_key)

    response = session.post(LOGIN_URL, data=payload)
    print("Login Response: %s" % response)

    response = session.get(SETTINGS_URL)
    print("Settings Page Response: %s" % response)

assert "General Account Settings" in response.text

Output:

auth_key: 777777774ea49e853634fbdc77777777
Login Response: <Response [200]>
Settings Page Response: <Response [200]>

AUTH_KEY is a regular expression that matches any pattern that looks like <input type='hidden' name='auth_key' value='?????' \/> where ????? is a group of zero or more characters (non-greedy, which means it looks for the shortest match). The documentation on the re module should get you started with regular expressions. You can also test that regular expression here, have it explained and toy around with it.

Note: If you were to actually parse (X)HTML, you should always use an (X)HTML parser. However, for this quick and dirty way to extract the hidden form field, a non-greedy regex does the job just fine.

Community
  • 1
  • 1
Lukas Graf
  • 30,317
  • 8
  • 77
  • 92
  • Thanks for the reply. I understand a lot better now. However, when I use this code, it gives me an error on line 22: 'Traceback (most recent call last): File "C:/Users//Desktop/Python34/Python_exercises/requests/#5.py", line 22, in auth_key = AUTH_KEY.search(response.content).group(1) TypeError: can't use a string pattern on a bytes-like object' – Carnageta Aug 05 '15 at 19:08
  • 2
    @Carnageta I think you should replace `response.content` with `response.text`. For more information check [doc](http://docs.python-requests.org/en/latest/user/quickstart/#binary-response-content). – Chaker Aug 05 '15 at 19:11
  • 1
    @ChakerBenhamed indeed, that's exactly it. After all we're dealing with (possibly encoded) text and not bytes, so that's the right thing to do even with Python 2.x. – Lukas Graf Aug 05 '15 at 19:15
  • Thanks! This helps a lot! Quesiton, how do I check whether or not I logged in successfully? Because even if I type in the wrong username/password, it still returns 200. – Carnageta Aug 05 '15 at 19:19
  • 1
    @Carnageta You could check for a `success key` for example `logout` or `welcome back`. Or you can check for the absence of `failure key` in your case it will be `Username or password incorrect.` – Chaker Aug 05 '15 at 19:22
  • @Chaker, thanks for the info. How would I actually check for the absence of failure key? Sorry, I am kind of new to Python over the internet. – Carnageta Aug 05 '15 at 19:32
  • 1
    @Carnageta turn's out I had a bug in my earlier code (wrote `ips_username` twice, also for the password field). Now I've added a check `assert "General Account Settings" in response.text` - that message only appears in the response to a request to the settings page if authentication was successful. – Lukas Graf Aug 05 '15 at 19:35
  • 1
    @Carnageta and the success / failure keys that Chaker is speaking of is not something standardized in any way - it's basically just a marker that you pick yourself, depending on how the web application is implemented, that can be used to indicate success or failure. In my example I visit the settings page and check for the presence of the text `General Account Settings` in the response - but it really could be anything. – Lukas Graf Aug 05 '15 at 19:41
  • @Lukas Graf, thanks so much! This is exactly what I was looking for! – Carnageta Aug 05 '15 at 19:41
  • 2
    You're welcome. Feel free to accept @Chaker's answer BTW, it should work just as well, and he was the first one to point out the `auth_key` ;-) – Lukas Graf Aug 05 '15 at 19:44
  • Question, I'm not really understand the "AUTH_KEY = re.compile' part. Is there a place I can go to get a better understanding of re? – Carnageta Aug 05 '15 at 19:44
  • 1
    @Carnageta the documentation on the [`re` module](https://docs.python.org/2/library/re.html) should get you started. You can also test that regular expression [here](https://regex101.com/r/dC9rF6/1), have it explained and toy around with it. – Lukas Graf Aug 05 '15 at 19:49
  • Hi, another question. The line, 'auth_key = AUTH_KEY.search(response.text).group(1)', what does this piece do, and what does group(1) mean? Thank you! – Carnageta Aug 05 '15 at 22:20