Scraping a secure website requiring clicks on javascript links

Question

I have a daily task at work to download some files from internal company website. The site requires a login. But the main url is something like:

https://abcd.com

But when I open that in the browser, it redirects to something like:

https://abcdGW/ln-eng.aspx?lang=eng&lnid=e69d5d-xxx-xxx-1111cef&regl=en-US

My task normally is to open this site, login, click some links back and forth and download some files. This takes me 10 minutes everyday. But I wanna automate this using python. Using my basic knowledge I have written below code:

import urllib3
from bs4 import BeautifulSoup
import requests
import http

url = "https://abcd.com"
redirectURL = requests.get(url).url

jar = http.cookiejar.CookieJar(policy=None)
http = urllib3.PoolManager()
acc_pwd = {'datasouce': 'Data1', 'user':'xxxx', 'password':'xxxx'}

response = http.request('GET', redirectURL)
soup = BeautifulSoup(response.data)
r = requests.get(redirectURL, cookies=jar)
r = requests.post(redirectURL, cookies=jar, data=acc_pwd)

print ("RData %s" % r.text)

This shows that I am able to successfully login. The next step is something where i am stuck. On the page after login I have some links on left side, one of those I need to click. When I inspect them in Chrome, I see them as:

href="javascript:__doPostBack('myAppControl$menu_itm_proj11','')"><div class="menu-cell">
    <img class="menu-image" src="images/LiteMenu/projects.png" style="border-width:0px;"><span class="menu-text">Projects</span> </div></a>

This is probably a javascript link. I need to click this, and then on new page another link, then another to download a file and back to the main page and do this all over again to download different files.

I would be grateful to anyone who can help or suggest.

Thanks to chris, I was able to complete this..

First using the request library I got the redirect url as:

redirectURL = requests.get(url).url

After that I use scrapy and selenium for click links and downloading files.. By adding selenium to the browser as add-in/plugin, it was quite simple.

https://stackoverflow.com/questions/33595858/python-sending-dopostback-to-join-groups-in-roblox javascript:__doPostBack('myAppControl$menu_itm_proj11','') .... "__doPostBack('EVENTTARGET', 'EVENTARGUMENT')" — QHarr, Aug 02 '19 at 22:51
@chrispbacon: thank you so much for showing me the direction. scrapy + selenium was quite simple — SarahB, Aug 07 '19 at 17:53

Scraping a secure website requiring clicks on javascript links

0 Answers0