1

Following this answer to a similar question, I am trying to scrape a site that provides the content I need to retrieve directly after logging-in:

import requests
creds = {'username_key': 'username_value', 'pw_key': 'pw_value'}
url = 'https://mollybet.com/beta/trade'
response = requests.post(url, data=creds) 

But I cannot find out from the log-in page's html what the username's and password's key values need to be and the status_code I keep getting in the response object is 405 (Not Allowed).

  1. Is it obvious from the tags in the html code what the key values need to be or am I completely off in the way I am trying to resolve this issue?

I also tried logging-in with selenium (chromedriver) and, again, I cannot I identify the input field elements. For example, although this code does locate the element I am targeting in the log-in page

from selenium import webdriver 
webdr_browser = webdriver.Chrome()
webdr_browser.get(url) 
soup = bs.BeautifulSoup(webdr_browser.page_source,'lxml')

>>> soup.find('input', class_='jss91 jss76')
<input aria-invalid="false" class="jss91 jss76" type="text" value=""/>

But when I am trying to locate the element in order to click it:

>>> webdr_browser.find_element_by_class_name('jss91 jss76')
Traceback (most recent call last):
...
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".jss91 jss76"}

Other find_element_by_ methods also fail so,

  1. Any idea why?
Tony
  • 781
  • 6
  • 22
  • So you know what username and password to use, but you don't know what to fill in for `x` and `y` in `{x: 'username_value', y: 'pw_value'}`? – chepner Jan 10 '20 at 17:40
  • @chepner, yes, exactly that – Tony Jan 10 '20 at 17:52
  • You may want to check the terms of service, it may be against them to login this way, which may be why they are not helping you do this. – sconfluentus Jan 10 '20 at 18:03

1 Answers1

2

I would suggest just using selenium to fill out the info for you. I've never really trusted searching by class unless it was truly necessary. I figure they are likely to change and don't tell you much about the structure. But since the page is fairly simple, searching by tag name seems to do the trick.

from selenium import webdriver
driver = webdriver.Chrome()

driver.get('https://mollybet.com/beta/login')

# Locate input and password fields
fields = driver.find_elements_by_tag_name('input')
fields[1].send_keys('USERNAME')
fields[2].send_keys('PASSWORD')

# Click the submit button
driver.find_element_by_tag_name('button').click()

From here, you can use selenuim or BeautifulSoup to parse the page contents going forward.

The issue you were having was with trying to locate on two classes at the same time. You need to modify your selector to

driver.find_element_by_class_name('jss91.jss76')

But make sure you pay attention to the elements because both the user name and password fields have the same two classes.

for field in fields:
    print(field.get_attribute('class'))

# jss91 jss76
# jss91 jss76 jss94 jss79

To do it with requests, I monitored the traffic as I sent the request to the site. It looks like the form submits to https://mollybet.com/s/weblogin/. The payload sent was {'username': "user", 'password': "pass", 'lang': "en"}. So in theory, the following should work, but I'm getting a 400 error. I also tried to add the headers from the original request. If the credentials were wrong, it should be a 401 error, but perhaps it would work with your login.

headers = {
    'Host': 'mollybet.com',
    'Connection': 'keep-alive',
    'Content-Length': '49',
    'Origin': 'https://mollybet.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',  # noqa
    'content-type': 'application/json',
    'Accept': '*/*',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Referer': 'https://mollybet.com/beta/login',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
}

response = requests.post(
    'https://mollybet.com/s/weblogin/',
    data={'username': "user", 'password': "pass", 'lang': "en"},
    headers=headers,
    verify=False)
Cohan
  • 4,384
  • 2
  • 22
  • 40
  • Thanks! Your solution does work and it also answers the second part of my question. I will mark it as accepted answer unless someone resolves the preferable `requests` approach – Tony Jan 10 '20 at 18:15
  • Added a little more to help you along the way. Not sure how to solve the 400 error though. – Cohan Jan 10 '20 at 18:40
  • Hmmm that is weird (re your requests solution); `response.text` returns `'{"status": "error", "code": "incorrect_username_or_password", "data": null}'` although the credentials are correct.. – Tony Jan 10 '20 at 18:41
  • Ah, so it does. I was just looking at the code rather than the text. They might have some other measure that I don't know about going on. worst case, the selinum approach seems to be working. – Cohan Jan 10 '20 at 18:43