First of all, requests
is not a browser and there is no JavaScript engine built-in.
But, you can mimic the unrelying logic by inspecting what is going on in the browser when you click "Accept". This is there Browser Developer Tools are handy.
If you click "Accept" in the first Accept/Decline "popup" - there is an "accepted=true" cookie being set. As for the second "Accept", here is how the button link looks in the source code:
<a href="javascript:agree()">
<img src="/static/images/investor/en/buttons/accept_Btn.jpg" alt="Accept" title="Accept">
</a>
If you click the button agree()
function is being called. And here is what it does:
function agree() {
$("form[name='agreeFrom']").submit();
}
In other words, agreeFrom
form is being submitted. This form is hidden, but you can find it in the source code:
<form name="agreeFrom" action="/investor/en/important-notice.page?submit=true&componentID=1298599783876" method="post">
<input value="Agree" name="iwPreActions" type="hidden">
<input name="TargetPageName" type="hidden" value="en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends">
<input type="hidden" name="FundId" value="10306">
</form>
We can submit this form with requests
. But, there is an easier option. If you click "Accept" and inspect what cookies are set, you'll notice that besides "accepted" there are 4 new cookies set:
- "irdFundId" with a "FundId" value from the "FundId" form input or a value from the requested URL (see "?FundId=10306")
- "isAgreed=yes"
- "isExpand=true"
- "lastAgreedTime" with a timestamp
Let's use this information to build a solution using requests
+BeautifulSoup
(for HTML parsing part):
import time
from bs4 import BeautifulSoup
import requests
from requests.cookies import cookiejar_from_dict
fund_id = '10306'
last_agreed_time = str(int(time.time() * 1000))
url = 'https://www.fidelity.com.hk/investor/en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends.page'
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}
session.cookies = cookiejar_from_dict({
'accepted': 'true',
'irdFundId': fund_id,
'isAgreed': 'yes',
'isExpand': 'true',
'lastAgreedTime': last_agreed_time
})
response = session.get(url, params={'FundId': fund_id})
soup = BeautifulSoup(response.content)
print soup.title
It prints:
Fidelity Funds - America Fund A-USD| Fidelity
which means we are seeing the desired page.