I am attempting to scrape a website using the following python code
import re
import requests
def get_csrf(page):
matchme = r'name="csrfToken" value="(.*)" /'
csrf = re.search(matchme, str(page))
csrf = csrf.group(1)
return csrf
def login():
login_url = 'https://www.edline.net/InterstitialLogin.page'
with requests.Session() as s:
login_page = s.get(login_url)
csrf = get_csrf(login_page.text)
username = 'USER'
password = 'PASS'
login = {'screenName': username,
'kclq': password,
'csrfToken': csrf,
'TCNK':'authenticationEntryComponent',
'submitEvent':'1',
'enterClicked':'true',
'ajaxSupported':'yes'}
page = s.post(login_url, data=login)
r = s.get("https://www.edline.net/UserDocList.page?")
print(r.text)
login()
This code logs into https://www.edline.net/InterstitialLogin.page successfully, but fails when I try to do
r = s.get("https://www.edline.net/UserDocList.page?")
print(r.text)
It doesn't print the expected page, instead it throws an error. Upon further testing I discovered that it throws this error even if you try to go directly to the page from a browser. This means that the only way to access the page is to run the code executed when the button is clicked to go there. So when I investigated the page source I found that the button used to link to the page I'm trying to scrape uses the following code
<a href="javascript:submitEvent('viewUserDocList', 'TCNK=headerComponent')" tabindex="-1">Private Reports</a>
So essentially I am looking for a way to trigger the above javascript code in python in order to scrape the resulting page.