0

I am trying to connect a website which seems to be in Ajax. The html page I want to get has the same URL as the landing page, it just changes once you login. Here's my code :

URL = 'http://www.pogdesign.co.uk/cat/'
payload = {' password': 'password', ' sub_login': 'Account Login', 'username': 'email'}

with requests.Session() as s:
    s.post(URL, data=payload)
    sock = urllib.urlopen(URL)
    psource = sock.read()

The page I get is the "not logged in page". I suspect I might have forgotten something about headers, or this is simply not how ajax works.

Thanks for your help!

Anton

Anton
  • 15
  • 1
  • 3

2 Answers2

0

It doesn't look like you sent the actual login request. Try something like:

URL = 'http://www.pogdesign.co.uk/cat/'
LOGIN_URL = 'http://www.pogdesign.co.uk/login/' # Or whatever the login request url is
payload = {' password': 'password', ' sub_login': 'Account Login', 'username': 'email'}

s = requests.Session()
s.post(LOGIN_URL, data=payload)
s.get(URL)
s.content
# >> your /cat/ content

The nice thing about Session is that it carries your cookies for you by default so once a session is authenticated it will continue working. I have an example at https://github.com/BWStearns/WhiteTruffleScraper which uses a session login.

You can find the login request URL by watching the traffic in developer tools and logging in.

BWStearns
  • 2,567
  • 2
  • 19
  • 33
  • Thanks for your answer. The issue is that the login URL is the same as the landing URL and as the URL I want to get, it is just refreshed through ajax. Here's the firebug log (http://hpics.li/d3955b0). – Anton Sep 22 '14 at 12:22
  • Ah, sorry about that. I saw /cat/ and assumed it was a [foo|bar] type URL and didn't look at the site itself. – BWStearns Sep 22 '14 at 17:55
0

You're posting your login with session.post but then trying to read the logged in page with urllib. urllib doesn't have any information about your login data (session cookie, for example), unless you explicitly provide it. When you post, you're not capturing the response. Even if you didn't require it, continue to use the session to request the login page again.

response = s.post(URL, data=payload)
# response holds the HTTP status, cookie data and possibly the "logged in page" html.
# check `response.text` if that's the case. if it's only the authentication cookie...
logged_in_page = s.get(URL)

When you do s.get() using the same session, the cookies you got when logging in are re-sent for subsequent requests. Since it's AJAX, you need to check what additional data, headers or cookies are being sent when done via browser (and whether it's get or post to retrieve subsequent pages.)

For the login post() login data may be sent as params, posted data or headers. Check which one is happening in your browser (using the dev tools --> "Network" in Firefox or Chrome).

Also, don't use the with context with sessions because it will end the session as soon as you exit that code block. You probably want your session s to last longer than just logging in, since it's managing your cookies, etc.

aneroid
  • 12,983
  • 3
  • 36
  • 66
  • Thanks for the answer. I tried with .get before and it doesn't work. Here's the firebug info on the post: http://hpics.li/d3955b0. I don't know if I must pass all the haeders and the cookies as in the image. I can't seem to find which get or post method refreshes the page after login too... – Anton Sep 22 '14 at 12:19
  • I found a solution using headers and a get to retrieve the refreshed page, thanks ! – Anton Sep 22 '14 at 14:07
  • glad it worked out. btw, it's strange that the response code is 302 for a successful login. Maybe for a new after-login page. – aneroid Sep 22 '14 at 15:47