0

Trying to log into this website https://lse.co.uk but have been unsuccessful. I have looked on StackOverflow and read multiple questions & answers but they are all different. Either that or I missed one that matches this case.

This is what I have.

import requests

login_url = "https://www.lse.co.uk/login.html"
s = requests.session()
payload = {
    "txtEmail": "some@email.co.uk", 
    "txtPassword": "somepassword"
}
r = s.post(login_url, data=payload)

Also tried the above but encoding the credentials with Base64.

Inspecting the html code from Chrome I can see a Base64 string. Should I capture this and encode both username and password with this string? The Base64 string is not visible in the output of r.content, so not sure how to do this either.

enter image description here

James
  • 796
  • 1
  • 8
  • 19

2 Answers2

1

looking at the form, it's likely you do not submit all the inputs from the form. Simply sending the two form inputs you need isn't enough.

It's likely the code reading the form is expecting more from your code, first there are two hidden inputs that gives some context:

<input type="hidden" name="txtFormType" value="LOGIN">
<input type="hidden" name="txtLoginSource" value="MAIN">

so you should add them to your scraping code:

>>> payload = {
    "txtEmail": "some@email.co.uk", 
    "txtPassword": "somepassword",
    "txtFormType": "LOGIN",
    "txtLoginSource": "MAIN"
}

if you're lucky, that's all it's looking for, and the form will work.

If you're not that means you need to provide the recaptcha hidden element, which is there to prevent users from scripting access to the login page (mostly to avoid brute force by bots, with the side effect to be a brain fsck to people willing to do legit scripts).

So let's check that:

>>> result = requests.get(login_url)

then you need to use an html parser, like lxml:

>>> from lxml import etree

and you got to parse the html:

>>> page = etree.fromstring(r.text, etree.HTMLParser())

and there you try to fetch it:

>>> tree.xpath("//form[@class='login__form']/input[name='g-recaptcha-response-v3']")
[]

heck, it's not there!

That's because it's likely to be handled by a script adding that hidden input using javascript when the page is loaded. So there you're doomed, there's no easy solution.

One of the solutions is to pull the big guns, using a real browser to open the page, have the google javascript running, doing a few things to make sure you're not being detected as a bot (like resizing the window when loading the page), and fetch that hidden input's value.

Hopefully, you can use selenium to do that, cf that answer. I won't get into how you install selenium, but your code might be like:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, 
executable_path=r'/path/to/chromedriver')
driver.get(login_url)

# here get the g-recaptcha-response-v3 element to fetch its value, so you can add it to the payload

I'm sorry I'm not going deep into that solution, but you should have enough to get started and explore it.

zmo
  • 24,463
  • 4
  • 54
  • 90
0

I'm not that good with python I'm Also trying to learn requests too. I can try to help you with looking at the response, You can try to

print(r.text)

You will see the website response. This is not a fix but more like a way to see if something went wrong

Teeext
  • 5
  • 2