1

The point of this project is simple, but some pointers form anyone who feels they have something to add would be appreciated.

Purpose: The application's purpose is to enter an account on Myth-Weavers (https://www.myth-weavers.com/) and return the names of all Dungeons and Dragons sheets that have been created on the account. This

The app should also be able to take a direct link (https://www.myth-weavers.com/sheet.html#id=2311944). This is theoretically possible because you are able to access the link and associated sheet without being logged into Myth-Weavers.

PART ONE: I need to be able to have the application enter the site and use my log-in credentials to enter my account. When I log into the site the following form data is sent on the network:

vb_login_username: Testbug Jones
vb_login_password: 
s: 
securitytoken: guest
do: login
vb_login_md5password: fea5ff2cf4764d2e76ea81e68bb458d1
vb_login_md5password_utf: fea5ff2cf4764d2e76ea81e68bb458d1

I am using the following code to check my progress through the log in:

import requests

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/85.0.4183.121 Safari/537.36'
  }

login_data = {
    's' : '',
    'securitytoken' : 'guest',
    'vb_login_username' : 'Testbug Jones',
    'vb_login_password' : 'TeStBuG',
    'redirect' : 'index.php',
    'login' : 'Login',
    'vb_login_md5password' : 'fea5ff2cf4764d2e76ea81e68bb458d1',
    'vb_login_md5password_utf' : 'fea5ff2cf4764d2e76ea81e68bb458d1'
}


#get page
url = 'https://www.myth-weavers.com/'
source = requests.get(url)

#isolates login form, along with an sid
print('\n\n***CURRENT LOGIN STATUS***')
login_status = source.text
login_status = login_status.split("<!-- login form -->")[1]
login_status = login_status.split("<!-- / login form -->")[0]
print(login_status)

#nab sid and update library
sid  = login_status.split('<input type="hidden" name="s" value="')[1]
sid = sid.split('" /')[0]
login_data['s'] = sid

#create session and attempt to log in
with requests.Session() as s:
  print('\n\n***ATTEMPTING TO LOGIN***')
  r = s.post(url, data = login_data, headers = headers)
  login_status = r.text
  login_status = login_status.split("<!-- login form -->")[1]
  login_status = login_status.split("<!-- / login form -->")[0]
  print(login_status)

As for the login form itself, it normally looks like:

<li class="smallfont" id="login" style="width: auto; float: right; text-align: right; padding-right: 6px;">
        <span id="login_register"><a href="#" onclick="fetch_object('login_register').style.display = 'none'; fetch_object('login_form').style.display = ''; return false;" tabindex="0">Log In</a> / <a href="https://www.myth-weavers.com/register.php?s=f4cde1e552e96a9a2b4c4479559e6510">Register</a> <a href="//www.myth-weavers.com/login.php?do=lostpw" style="font-size:smaller">forgot password?</a></span>
        <form id="login_form" style="display: none;" action="https://www.myth-weavers.com/login.php?do=login" method="post" onsubmit="md5hash(vb_login_password, vb_login_md5password, vb_login_md5password_utf, 0)">
        <script type="text/javascript" src="//static.myth-weavers.com/clientscript/vbulletin_md5.js?v=388"></script>

        <input type="text" class="bginput" style="font-size: 10px" name="vb_login_username" id="navbar_username" size="10" accesskey="u" tabindex="0" value="User Name" onfocus="if (this.value == 'User Name') this.value = '';" onblur="if (this.value == '') this.value = 'User Name';" />

        <input type="password" class="bginput" style="font-size: 10px" name="vb_login_password" id="navbar_password" size="10" tabindex="0" value="Password" onfocus="if (this.value == 'Password') this.value = '';" onblur="if (this.value == '') this.value = 'Password';" />

    <label for="cb_cookieuser_navbar"><input type="checkbox" name="cookieuser" value="1" tabindex="0" id="cb_cookieuser_navbar" accesskey="c" />Remember Me?</label>

        <input type="submit" class="button" value="Log in" tabindex="0" title="Enter your username and password in the boxes provided to login, or click the 'register' button to create a profile for yourself." accesskey="s" />

        <input type="hidden" name="s" value="f4cde1e552e96a9a2b4c4479559e6510" />
        <input type="hidden" name="securitytoken" value="guest" />
        <input type="hidden" name="do" value="login" />
        <input type="hidden" name="vb_login_md5password" />
        <input type="hidden" name="vb_login_md5password_utf" />
        </form>
</li>

At this point I think what is stopping me is 1)syntax as I am obviously new, 2) cookies are not being handled correctly or 3)securitytoken/sid is not being handled correctly, but I'm reaching the point where I can see my errors but not the way to overcome them. Any help or insight in getting past this would be very helpful!

PART TWO: This will allow me to access a page on the site, specifically the "Sheets" page, and print out a list of all Character Sheets found there. It will also be able to retrieve the JSON files stored in the table rows the character names are found.

Desi Dao
  • 13
  • 2

1 Answers1

1

You should make the first request using requests.Session() to get the cookies and send them back when you make the post /login.php. Also, you can use beautifulsoup to get all the input name/value in the login form, so you just add your username/password (so you don't hardcode anything other than username/password)

The password is md5 hashed, so you can use hashlib to encode it

The following make the login call :

import requests
from bs4 import BeautifulSoup
import hashlib

url = "https://www.myth-weavers.com"
username = "Testbug Jones"
password = "TeStBuG"

s = requests.Session()
r = s.get(url)

soup = BeautifulSoup(r.text, "html.parser")
form = soup.find("form",{"id":"login_form"})
payload = dict([(t.get("name"),t.get("value","")) 
    for t in form.findAll("input")
    if t.get("name")
])

md5 = hashlib.md5(password.encode('utf-8')).hexdigest()
payload["vb_login_username"] = username
payload["vb_login_password"] = password
payload["vb_login_md5password"] = md5
payload["vb_login_md5password_utf"] = md5

r = s.post(f"{url}/login.php", 
    params= {"do": "login"},
    data = payload
)

Then, you can use s.get(".....") to get the sheets data like this :

r = s.get(f"{url}/sheets")
soup = BeautifulSoup(r.text, "html.parser")
rows = soup.find("table").find_all("tr")[1:]
sheet_data = []
for row in rows:
    tds = row.find_all("td")
    download_link = f'{url}{tds[5].find("a")["href"]}'
    json = s.get(download_link)
    sheet_data.append({
        "name": tds[1].text.strip(),
        "template": tds[2].text.strip(),
        "game": tds[3].text.strip(),
        "download_link": download_link,
        "json": json.json()
    })

print(sheet_data)

run this on repl.it

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • Thank you! This solved the problem, but more importantly than that it gave me a helpful pointer on the backend literature I need to read (ie, encoding and BeautifulSoup). I do have a follow up question, though: How did you know to ignore the user-agent requirement? Every video I've viewed so far has made sure to stress the point of getting it. – Desi Dao Oct 03 '20 at 18:27
  • It can happen that some server/website ckeck for the user agent header to be from a valid browser. In this case it doesn‘t check any header. In practise, when I copy a http call from chrome developer console. I right click „“copy as curl“ and remove all headers to see if it‘s working without anything – Bertrand Martel Oct 03 '20 at 19:04