-3

How can I reach the following webpage using Python Requests?

https://www.fidelity.com.hk/investor/en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends.page?FundId=10306

enter image description here

The page is forwarded until I click the 2 "Accept" buttons.

enter image description here

This is what I do:

import requests
s = requests.Session()
r = s.post("https://www.fidelity.com.hk/investor/en/important-notice.page?submit=true&componentID=1298599783876")
r = s.get("https://www.fidelity.com.hk/investor/en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends.page?&FundId=10306")

How do I handle the first "Accept" button, I have checked there is a cookie called "Accepted", am I correct?:

<a id="terms_use_accept" class="btn btn-default standard-btn smallBtn" title="Accept" href="javascript:void(0);">Accept</a>
Terence Ng
  • 442
  • 2
  • 8
  • 19
  • 1
    Why do you need to deal with that? You can still se the HTML code "behind" that pop-up. – cdonts Mar 07 '15 at 03:42
  • @cdonts I have to click the 2 "Accept" buttons to forward to the page of historical fund price. – Terence Ng Mar 08 '15 at 04:13
  • Why don't you use the URL of the prices page instead of the one above? – cdonts Mar 08 '15 at 14:43
  • @cdonts I suppose I am using the URL of the prices page already. Please correct it if I am wrong! – Terence Ng Mar 08 '15 at 22:59
  • There are no direct modules to scrap dynamic web pages. Either u should use ghost or selenium. http://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python . If not in python, phantomjs helps you. – Murali Mopuru Mar 10 '15 at 12:58

3 Answers3

1

First of all, requests is not a browser and there is no JavaScript engine built-in.

But, you can mimic the unrelying logic by inspecting what is going on in the browser when you click "Accept". This is there Browser Developer Tools are handy.

If you click "Accept" in the first Accept/Decline "popup" - there is an "accepted=true" cookie being set. As for the second "Accept", here is how the button link looks in the source code:

<a href="javascript:agree()">
    <img src="/static/images/investor/en/buttons/accept_Btn.jpg" alt="Accept" title="Accept">
</a>

If you click the button agree() function is being called. And here is what it does:

function agree() {
    $("form[name='agreeFrom']").submit();
}

In other words, agreeFrom form is being submitted. This form is hidden, but you can find it in the source code:

<form name="agreeFrom" action="/investor/en/important-notice.page?submit=true&amp;componentID=1298599783876" method="post">
    <input value="Agree" name="iwPreActions" type="hidden">
    <input name="TargetPageName" type="hidden" value="en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends">
    <input type="hidden" name="FundId" value="10306">
</form>

We can submit this form with requests. But, there is an easier option. If you click "Accept" and inspect what cookies are set, you'll notice that besides "accepted" there are 4 new cookies set:

  • "irdFundId" with a "FundId" value from the "FundId" form input or a value from the requested URL (see "?FundId=10306")
  • "isAgreed=yes"
  • "isExpand=true"
  • "lastAgreedTime" with a timestamp

Let's use this information to build a solution using requests+BeautifulSoup (for HTML parsing part):

import time

from bs4 import BeautifulSoup
import requests
from requests.cookies import cookiejar_from_dict


fund_id = '10306'
last_agreed_time = str(int(time.time() * 1000))
url = 'https://www.fidelity.com.hk/investor/en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends.page'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}
    session.cookies = cookiejar_from_dict({
        'accepted': 'true',
        'irdFundId': fund_id,
        'isAgreed': 'yes',
        'isExpand': 'true',
        'lastAgreedTime': last_agreed_time
    })

    response = session.get(url, params={'FundId': fund_id})

    soup = BeautifulSoup(response.content)
    print soup.title

It prints:

Fidelity Funds - America Fund A-USD| Fidelity

which means we are seeing the desired page.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thank you very much! What tools will you use if Python Requests is not compulsory? – Terence Ng Mar 12 '15 at 01:02
  • @TerenceNg glad to help. A solution based on `selenium` would be much easier and transparent here since it works on the browser level (hence this is usually a limitation). Let me know if you are interested in this type of approach. – alecxe Mar 12 '15 at 01:13
  • @TerenceNg ok, posted as a separate answer. Check it out. – alecxe Mar 12 '15 at 03:24
1

You can also approach it with a browser automation tool called selenium:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()  # could also be headless: webdriver.PhantomJS()
driver.get('https://www.fidelity.com.hk/investor/en/fund-prices-performance/fund-price-details/factsheet-historical-nav-dividends.page?FundId=10306')

# switch to the popup
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "iframe.cboxIframe")))
driver.switch_to.frame(frame)

# click accept
accept = driver.find_element_by_link_text('Accept')
accept.click()

# switch back to the main window
driver.switch_to.default_content()

# click accept
accept = driver.find_element_by_xpath('//a[img[@title="Accept"]]')
accept.click()

# wait for the page title to load
WebDriverWait(driver, 10).until(EC.title_is("Fidelity Funds - America Fund A-USD| Fidelity"))

# TODO: extract the data from the page
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

You can't handle JavaScript using requests nor the urllib modules. But based on my knowledge (which is not much) I'll tell you how I would solve this problem.

This site is using a specific cookie to know if you have already accepted their policy. If not, the server redirects you to the page shown in the image above. Look for that cookie using some Add-On and set it manually so the website shows you the content you're looking for.

Another way is to use Qt's built-in web browser (which uses WebKit) that lets you execute JavaScript code. Simply use evaluateJavaScript("agree();") and there you go.

Hope it helps.

cdonts
  • 9,304
  • 4
  • 46
  • 72