0

I am trying to make a timetable app for my university schedule since our school website does not provide a visualized timetable, rather it provides something like this:

description of registered courses

With this in mind, I have already started on this project and listed out the steps I need to take in order to complete it.

  1. Write a function that logs into the school website
  2. Write a function that saves the HTML file that includes the description of registered courses
  3. Write a function that scrapes the data from the HTML file and save the necessary data into fields such as COURSE_ID, COURSE_LOCATION, COURSE_STARTTIME, COURSE_ENDTIME, etc..
  4. Write a function that builds a visualized timetable with these fields as parameters

Of these four generalized steps, I have completed the third step, which is the data scraping portion. However, I have run into some problems and could not figure out how to do step 1. I was wondering if anyone could help me out here.

To provide more specific details, the school website link is https://ics.twu.ca/ICS/. From here, I do not know how to write a script that can request the URL and make a POST request with username and password.

I am writing this program in Python.

Jack Park
  • 9
  • 1
  • Please re-take the intro tour, especially [How to Ask](https://stackoverflow.com/help/how-to-ask). Perhaps first, see the meta-post ["Can someone help me" is not an actual question](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question). – Prune Mar 17 '20 at 18:27
  • `I have run into some problems` what are these ? – Bertrand Martel Mar 17 '20 at 18:51
  • the main problem is that the csrf verification fails when I do GET requests – Jack Park Mar 18 '20 at 00:14

1 Answers1

0

The authentication request is a POST on https://ics.twu.ca/ICS/. Get all input name/value pairs from the page, use a session to record the cookies. This form uses multipart/form-data so use files parameter.

import requests
from bs4 import BeautifulSoup

url = "https://ics.twu.ca/ICS/"

username = "your_username"
password = "your_password"

session = requests.Session()

r = session.get(url)

soup = BeautifulSoup(r.text, "html.parser")

payload = dict((x, (None, y)) for x, y in [
    (i["name"], i.get("value"))
    for i in soup.find_all("input")
])

payload["userName"] = username
payload["password"] = password

print(payload)

r = session.post(url, files = payload)

print(r.text)

In the above payload is a dictionnary of the input value with tuple of (None, value) since we don't need the filename variable (we are not uploading files but just sending values). Check this

If this is not working add headers such as user-agent

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159