I'm trying to scrape my school's website for my upcoming assignments, and add it to a file. However I need to log in to find my assessments, and the website is dynamically loaded, so I need to use Selenium. My problem is I'm using the requests package to authenticate myself on the website, but I don't know how to open the website with Selenium. Then I'm hoping to take the HTML and scrape it with Beautiful Soup, I would prefer not to learn another Framework. Here is my Code: ''' import json from requests import Session from bs4 import BeautifulSoup from selenium import webdriver
# Login function that takes the username and password
def login(username, password):
s = Session()
payload = {
'username' : username,
'password': password
}
res = s.post('https://www.website_url.com', json=payload)
print(res.content)
return s
session = login('username', "password")
driver_path = r'C:\Users\username\Downloads\edgedriver_win64\msedgedriver.exe'
url = 'https://www.website_url.com/assessments/upcoming'
driver = webdriver.Edge(driver_path)
driver.get(url)
''' The website loads up, but it reverts me to the login page. P.S. I managed to open the website with Beautiful Soup, but since it is dynamically loaded I can't scrape it.
Edit: Hey, thanks for the answer! I tried it and it should work, sadly, it is throwing a lot of errors:
[9308:26392:0215/111025.239:ERROR:chrome_browser_main_extra_parts_metrics.cc(251)] START: GetDefaultBrowser(). If you don't see the END: message, this is crbug.com/1216328.
[9308:7708:0215/111025.270:ERROR:device_event_log_impl.cc(214)] [11:10:25.271] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9308:7708:0215/111025.281:ERROR:device_event_log_impl.cc(214)] [11:10:25.287] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
ode connection: A device attached to the system is not functioning. (0x1F)
[9308:26392:0215/111025.313:ERROR:chrome_browser_main_extra_parts_metrics.cc(255)] END: GetDefaultBrowser()
I'm not sure what this is, I had a look at the Xpath and it seems to have changed when I resized it I think. My teacher told me (he isn't familiar with python) that I should try login to the website on a window and open another tab with Selenium so I could avoid the login because I'm logged in on the other tab, I've looked around of how to open a new tab not a window but I can't find anything. Thank you!
Hey, I just found the answer, the problem was the HTML id, and Xpath was changing each reloads and I didn't realize I could use CSS selectors, so i did that, you've helped me a lot I appreciate it.
login_box = driver.find_element_by_css_selector('body > div.login > div.auth > div.loginBox')
input_boxes = driver.find_elements_by_css_selector('.login>.auth label>input')
input_buttons = driver.find_elements_by_css_selector('.login>.auth button')
input_boxes[0].send_keys(username)
input_boxes[1].send_keys(password)
input_buttons[0].click()