4

I am attempting to scrape a website which as well as requiring a login, the core data is rendered with javascript and XHR files. I am using the html-requests library, however the render() function appears to have no effect on the webpage. Here is my code:

import requests_html as requests
import bs4 as bs

# variables...

# def createForm()...

with requests.HTMLSession() as session:
            print("retrieving page...")
            initial_response = session.get(login_url)

            print("logging in...")
            response = session.post(url = login_url, data = createForm(initial_response))
            page_html = session.get(target_url)

            page = bs.BeautifulSoup(page_html.content, 'lxml')
            html_before = page.prettify()

            print('rendering...')
            page_html.html.render(sleep = 5)

            page_rendered = bs.BeautifulSoup(page_html.content, 'lxml')
            html_after = page_rendered.prettify()

            if html_before == html_after:
                print("they are the same")

This is the html returned (the important bits):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title>
   Home | Compass
  </title>
  <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
  <script src="/cdn-cgi/apps/head/nXBUbHOMoxcWCnUqQqrCuyGGJ4s.js">
  </script>

    Boring CSS...

  <meta content="text/html;charset=utf-8" http-equiv="Content-type"/>
 </head>
 <body class="greyBody">

    Dull JSON...

Compass.assemblyVersion = "11.44.1.0";Compass.isDev = false;Compass.organisationUserId = 1921;Compass.organisationUserSussiId = "SAGE.ALLEN";Compass.organisationUserBaseRole = 1;Compass.organisationUserRoles = { "AfterHoursAccess": true, "MyFilesBase": true, "StaffStudentsMisc": true, "StudentsMisc": true};Compass.schoolId = "shenton.wa.edu.au";Compass.schoolName = "Shenton College";Compass.schoolPrimaryFqdn = "shenton-wa.compass.education";Compass.headAncestorId = "shenton.wa.edu.au";Compass.hasChildOrganisations = false;Compass.isInHierarchy = false;Compass.isTargetingAncestry = false;
  </script>
  <a href="/Communicate/Documentation/Help.aspx" style="position: absolute; left: -999px">
   Help
  </a>
  <form action="./" id="aspnetForm" method="post">
   <div class="aspNetHidden">
    <input id="__EVENTTARGET" name="__EVENTTARGET" type="hidden" value=""/>
    <input id="__EVENTARGUMENT" name="__EVENTARGUMENT" type="hidden" value=""/>
    <input id="__VIEWSTATE" name="__VIEWSTATE" type="hidden" value="cUphbVG2sD46yFu7rFLU15w0eiJn+7KXkA6I6Cg/7RQ9m3rwlz5poc6KdcuOMApzHcafUPq70DbpviYl6V7vDYHgLMx23YF8OtMtdcmxVSk="/>
   </div>
   <script type="text/javascript">
    //<![CDATA[
var theForm = document.forms['aspnetForm'];
theForm = document.aspnetForm;
}
}heForm.submit();GUMENT.value = eventArgument;= false)) {
}
//]]>
   </script>
   <script src="/WebResource.axd?d=pynGkmcFUV13He1Qd6_TZMBp8pi1aG3kj_Rrf_NckYpQU5qPM8p1FZ-Rik-uln5rcqPDnR_gxYalKXvDaBNyhg2&amp;t=636165368714134089" type="text/javascript">
   </script>
   <script src="/ScriptResource.axd?d=NJmAwtEo3Ipnlaxl6CMhvk3jMxAVfdhwj8EfOKm3TxozcZHxkgtaPL9w9WaPcaq30sskp_Glm4jiP922KJP1an86NqAQUdSFO5rhKIKoAuO5v3uoNlAezbrUkCluOH1LV_F9OB_HI13vUK6I2eQlLQ80jzjIESOQbg5oZuzg3A01&amp;t=ffffffffd416f7fc" type="text/javascript">
   </script>
   <script src="/ScriptResource.axd?d=dwY9oWetJoJoVpgL6Zq8OJy60eKvb9zs3HNOFEuh2HK-a1JlTWrINdUt4GmfnVpd-vC-hGQfNOA-_hpGAIQxLJ6TRvLcoTQZ7vzC5ouXwZ7EB1Rqgo_p4dWNsoX1AAW-I0gKht_6IBwAHOTP4LV38H7v4PjwKJBs7h2NgozR47s1&amp;t=ffffffffd416f7fc" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/System/Scripts/4ed8095_javascript-resource-manager.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Common/Scripts/62cdce8_utility.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/System/Scripts/7c66c7e_ravenjs-loader.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/ext-js4.2.2/5de6c0f_ext-all.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/ce7ba4b_jquery-1.8.3.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Common/Scripts/81a11e3_autosuggest-widget.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/ef94bb5_jquery-json-2.3.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/5fee56b_jquery.elastic.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/b9aa653_jquery.simplemodal.1.4.3.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/moment/cdeefcf_moment-and-data.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/ext-js4.2.2/resources/js/8f9f704_ext-extensions-and-theme.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Common/Scripts/d3ac6df_impersonate-widget.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Common/Scripts/0d786b6_compass.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/System/Scripts/bb8b963_request-capture.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/System/Scripts/17975e6_external-resource-monitor.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/ckeditor/0d3caaa_ckeditor.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/Calendar/Scripts/625070f_calendar-and-extensions.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/PageScripts/583fb06_HomePage.Chronicle.min.js" type="text/javascript">
   </script>
   <script src="https://assets.compass.education/StaticAssetsK/PageScripts/9571a67_HomePage.min.js" type="text/javascript">
   </script>
   <script type="text/javascript">
    //<![CDATA[
Sys.WebForms.PageRequestManager._initialize('ctl00$ctl04', 'aspnetForm', [], [], [], 90, 'ctl00');
//]]>

       </script>
       <script type="text/javascript">
    });ersonateWindow.show();xt.create('Compass.widgets.ImpersonateWidget', 

{});, function(e) {ggestions?sessionstate=readonly", 

Unremarkable HTML...

Encrypted:6SeqWGbwjzN6ZfMnVVAU1sXLmHGC06o6K+A6lAhEBFCLQgeQq6ZU810mqSzy0zNyMwUhKnrAlYfvvlTuy5xpIj4OkW4pGBLFN6PVai3RoevYQkgbvy9vqVBanzrNVfRGsMIE8kgq+8pJGtNiCveqQAvzLfhgHhm5QQ8/k4ShskzjZdRPX9MUNpa-->kHWHQOxCM73dFIgYrWM6PexC+wA31RdtyPTEp7gRCb7ulIlQFSKresH2xPmdHNeLhA7mCefNrbBDMG7eJ5kqhLsh3QqbxMQ1IABdA42nGGSdw1GFkmRJYS06mNS4Cjp44cmQBt
           <script src="https://assets.compass.education/StaticAssetsK/Scripts/Lib/e0c3e6b_LazyLoad.min.js" type="text/javascript">
           </script>
           <script type="text/javascript">
           </script>
           <div class="aspNetHidden">
            <input id="__VIEWSTATEGENERATOR" name="__VIEWSTATEGENERATOR" type="hidden" value="CA0B0334"/>
           </div>
          </form>
         </body>
        </html>

I have not managed to decipher all the scripts as I am not experienced in javascript, although they appear to be fetching the data. Any explanation as to why these scripts are not running or any alternative solution (which is adequately fast) is appreciated.

S. Allen
  • 171
  • 1
  • 10
  • @ewwink The problem is not in the initial response, actually retrieving the page html is all fine and all javascript code is present within the html. My problem is in rendering javascript elements of the page, specifically why the `render()` function does not renders these elements. – S. Allen Nov 30 '18 at 14:08

0 Answers0