0

I am attempting to get data from a website that requires being in a certain location or to log in. My problem is that it seems to redirect to a new session that I am unable to programmatically access with Python. The following is how I tried to access it...

payload = {
    'user' : 'myusername',
    'pass': 'mypassword'
}

session = requests.session()
r = session.post("http://apps.webofknowledge.com/WOS_CitedReferenceSearch_input.do?SID=1DtxhgpRsI16gPP7tRC&product=WOS&search_mode=CitedReferenceSearch",
                 data=payload)

print(r.text)

which results in the following output that indicates the redirect is not being properly captured...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><!-- !DOCTYPE HTML PUBLIC "-/W3C/DTD HTML 4.01 Transitional/EN" -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/WoKcommon.css' type="text/css" />
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/main.css' type="text/css" />
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/jquery.js"></script>
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/main.js"></script>
<title>Web of Science - Starting New Session...</title>
    <script>
      function autoredirect() {
        var s = "true";
        document.cookie = "SID=1; expires=15/02/2000 00:00:00; domain=www.webofknowledge.com";
        if (false == s)
        {
            setTimeout("this.form.submit()", null);
        }
        else
        {             
            setTimeout("top.location.href='http://www.webofknowledge.com?'", null);
        }        

      }      
    </script>
</head>

<body id="WoKerror" onload="javascript:autoredirect()">


  <form action='http://www.webofknowledge.com'>


<div class="main-container">




<div class="navBar clearfix">
  <ul class="userCabinet nav-list">
    <li class="nav-item">
      <a title="" class="nav-link" href="javascript: void(0)">English <i class="icon-arrow"></i></a>
      <ul class="subnav">




              <li class="subnav-item">
               <a class="subnav-link" title="简体中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_CN"> 简体中文</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="繁體中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_TW"> 繁體中文</a>
              </li>





              <li class="subnav-item language-active-option">
               <a class="subnav-link" title="English" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=en_US"> English</a>
              </li>







              <li class="subnav-item">
               <a class="subnav-link" title="日本語" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ja"> 日本語</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="한국어" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ko_KR"> 한국어</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Português" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=pt_BR"> Português</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Español" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=es_LA"> Español</a>
              </li>






              <li class="subnav-item">
               <a class="subnav-link" title="Pусский" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ru_RU"> Pусский</a>
              </li>



      </ul>
    </li>
  </ul>
</div>  
<div class="logoBar">
  <h1 class="titleh1"><a href="http://www.webofknowledge.com/"> <span title="Web of Science">Web of Science</span> </a></h1>
  <span><img alt="Clarivate Analytics" title="Clarivate Analytics" src="http://login.webofknowledge.com/error/WOK5/images/trlogo.png" /></span>
</div>


<!-- Begin : Module Title Shell -->
<table border="0" cellpadding="0" cellspacing="0" width="100%">
  <tbody>
    <tr>
      <td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWleftOuterEdge" width="8"></td>
      <td class="NEWwokErrorContainer">
          <div class="NEWpageTitle"><H1>Thank you for using Web of Science</H1></div>
     </td>
      <td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge"></td>
    </tr>
  </tbody>
</table>
<!-- End : Module Title Shell -->

<!-- Begin : WoK Error Shell -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" valign="top">
  <tr>
    <td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" width="8" class="NEWleftOuterEdge" /></td>
    <td class="NEWwokErrorContainer SignInLeftColumn ">
       <!-- Begin : Error -->
       <h2>STARTING A NEW SESSION...</h2>
       <p>


       <p>If a new session is not started automatically in a few seconds, click <a href="http://www.webofknowledge.com?" target="_top">establish a new session</a>.


       <!-- End : Error --></td>
    <td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge" /></td>
  </tr>
</table>
<!-- End : WoK Error Shell -->
 </form>  




<div id="skip-to-footer" class="footer">
  <div class="footerContent">
    <ul>
      <li><span>&copy; 2017</span>&nbsp;<a id="TRcopyright" title="Clarivate Analytics" href="http://clarivate.com" name="Clarivate Analytics" target="_new"> Clarivate Analytics</a></li>
      <li><a id="TRpolicy" title="Terms of Use" href="http://wokinfo.com/terms" name="Terms of Use" target="_new"> Terms of Use</a></li>
      <li><a id="TRprivacy" title="Privacy Policy" href="http://ip-science.thomsonreuters.com/privacy" name="Privacy Policy" target="_new"> Privacy Policy</a></li>
      <li><a id="TRfeedback" title="Feedback" href="http://science.thomsonreuters.com/info/wokfeedback" name="Feedback" target="_new"> Feedback</a></li>
    </ul>
  </div>
</div>


</body></html>

I have tried first sending a post request, then sending a get request to the url that it redirects to when I am doing the same operation in my browser while passing in the cookies from the post request, but I am getting an empty array for my cookies when I call r.cookies , so I end up getting the same HTML output as shown above. The issue seems to be that I cannot redirect through Python into the new session the website initiates.

Min
  • 327
  • 1
  • 4
  • 22
  • 1
    You're creating the variable `session` and then using `requests.post`. Try using `session.post` instead. Also, use a context manager (`with requests.session() as session: # do stuff` – Evya Dec 11 '17 at 20:43
  • This provides the same exact result. It does not redirect to the session, but instead gives me the html for the redirect page. – Min Dec 11 '17 at 21:57
  • I think a js function executes the redirect (from submission, as seen on the response page). Can you post `session.history` & `session.cookies` after you make the request ? – Evya Dec 11 '17 at 22:14
  • 1
    It uses a JavaScript redirect with a JavaScript set cookie try using something that renders JavaScript see my answer to https://stackoverflow.com/questions/45259232/scraping-google-finance-beautifulsoup/45259523#45259523 for 3 ways yo do this – Dan-Dev Dec 11 '17 at 22:18
  • Yep @Dan-Dev is right. But can't he imitate the final request with the cookie from the JS set cookie ? – Evya Dec 11 '17 at 22:27

0 Answers0