1

I'm trying to scrape my school's website for my upcoming assignments, and add it to a file. However I need to log in to find my assessments, and the website is dynamically loaded, so I need to use Selenium. My problem is I'm using the requests package to authenticate myself on the website, but I don't know how to open the website with Selenium. Then I'm hoping to take the HTML and scrape it with Beautiful Soup, I would prefer not to learn another Framework. Here is my Code: ''' import json from requests import Session from bs4 import BeautifulSoup from selenium import webdriver

# Login function that takes the username and password
def login(username, password):
    s = Session()
    payload = {
        'username' : username,
        'password': password
    }
    res = s.post('https://www.website_url.com', json=payload)
    print(res.content)
    return s

session = login('username', "password")
driver_path = r'C:\Users\username\Downloads\edgedriver_win64\msedgedriver.exe'
url = 'https://www.website_url.com/assessments/upcoming'
driver = webdriver.Edge(driver_path)
driver.get(url)

''' The website loads up, but it reverts me to the login page. P.S. I managed to open the website with Beautiful Soup, but since it is dynamically loaded I can't scrape it.

Edit: Hey, thanks for the answer! I tried it and it should work, sadly, it is throwing a lot of errors:

[9308:26392:0215/111025.239:ERROR:chrome_browser_main_extra_parts_metrics.cc(251)] START: GetDefaultBrowser(). If you don't see the END: message, this is crbug.com/1216328.        
[9308:7708:0215/111025.270:ERROR:device_event_log_impl.cc(214)] [11:10:25.271] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9308:7708:0215/111025.281:ERROR:device_event_log_impl.cc(214)] [11:10:25.287] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
ode connection: A device attached to the system is not functioning. (0x1F)
[9308:26392:0215/111025.313:ERROR:chrome_browser_main_extra_parts_metrics.cc(255)] END: GetDefaultBrowser()

I'm not sure what this is, I had a look at the Xpath and it seems to have changed when I resized it I think. My teacher told me (he isn't familiar with python) that I should try login to the website on a window and open another tab with Selenium so I could avoid the login because I'm logged in on the other tab, I've looked around of how to open a new tab not a window but I can't find anything. Thank you!

Hey, I just found the answer, the problem was the HTML id, and Xpath was changing each reloads and I didn't realize I could use CSS selectors, so i did that, you've helped me a lot I appreciate it.

login_box = driver.find_element_by_css_selector('body > div.login > div.auth > div.loginBox')
input_boxes = driver.find_elements_by_css_selector('.login>.auth label>input')
input_buttons = driver.find_elements_by_css_selector('.login>.auth button')
input_boxes[0].send_keys(username)
input_boxes[1].send_keys(password)
input_buttons[0].click()
CoopCodes
  • 15
  • 4

1 Answers1

0

You can use selenium webdriver to login to your school's website to have the session in webdriver and then load the page you want to scrape.

from selenium import webdriver

driver_path = r'C:\Users\username\Downloads\edgedriver_win64\msedgedriver.exe'
url = 'https://www.website_url.com/assessments/upcoming'
login_url = 'https://www.website_url.com'

driver = webdriver.Edge(driver_path)
driver.get(login_url)
driver.find_element_by_xpath("username input xpath").sendkeys(username)
driver.find_element_by_xpath("password input xpath").sendkeys(password)
driver.find_element_by_xpath("submit button xpath").click()

# wait for the page to load
driver.get(url)

You can also directly POST the credentials to the login page:

webdriver.request('POST', login_url, data={"username": username, "password": password})
  • For the window size part, this should help.

  • You can ignore these errors, it's just selenium/webdriver log.

  • I personally don't think you need a new tab but you can try it out. This post has lot of helpfull answers.

Let me know if you need more help.

ZhorGen
  • 41
  • 5