7

I've seen this other question: How to use Python to login to a webpage and retrieve cookies for later usage?

However, a straightforward modification of that answer did not work for me, so I'm wondering how I can achieve my goal.

To give context, I'm trying to log into https://mog.com/hp/sign_in and then extract the names of my playlists from the following page: http://mog.com/my_mog/playlists

I think this should be pretty straightforward for someone who knows what they are doing. Some basic code to log into the site and access a password-protected page would be great, and it would be even better if you could explain in a sentence or two what each line in the code is doing, so I can get a better understanding of what the code is doing.

Community
  • 1
  • 1
jonderry
  • 23,013
  • 32
  • 104
  • 171
  • What did you change? How do you know it didn't work for you? – Greg Hewgill Dec 11 '10 at 01:32
  • I changed the websites and login information. I know it didn't work because I printed out the links on the page and it wasn't the same links. It prints out the links from the registration page. – jonderry Dec 11 '10 at 01:41
  • Perhaps I'm not modifying this line correctly: `login_data = urllib.urlencode({'username' : username, 'j_password' : password})` but I don't know how to figure out what substitutions to make. – jonderry Dec 11 '10 at 01:46
  • "I don't know how to figure out what substitutions to make" Are you saying you don't know what the field names on the login form are? Did you look at the page with the login form to see what the `
    ` tags include for fields?
    – S.Lott Dec 11 '10 at 07:12
  • Yeah, I tried 'user[login]' and 'user[password]' but those did not work. – jonderry Dec 12 '10 at 02:29

1 Answers1

12

Try with mechanize:

import mechanize
br=mechanize.Browser()
br.open('https://mog.com/hp/sign_in')
br.select_form(nr=0) 
br['user[login]']= your_login
br['user[password]']= your_password
br.submit()
br.retrieve('http://mog.com/my_mog/playlists','playlist.html')

EDIT:
to get your links you can add this:

for link in br.links():
    print link.url, link.text

or, starting from playlist.html, you could use Beautifulsoup and regex:

from BeautifulSoup import BeautifulSoup
import re
soup = BeautifulSoup(file('playlist.html').read())
for link in soup.findAll('a', attrs={'href': re.compile("your matching re")}):
    print link.get('href')
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • 1
    I got this to work and pull the html to a file, as written. If I want to retrieve some links matching a pattern from mog.com/my_mog/playlists, what's the command for that? I'm having trouble finding clear, easily searchable documentation for mechanize. – jonderry Dec 11 '10 at 02:54
  • 3
    @jonderry I don't think there is a command, you might wanna use regexp to match a pattern – Asterisk Dec 11 '10 at 05:08