I am attempting to get data from a website that requires being in a certain location or to log in. My problem is that it seems to redirect to a new session that I am unable to programmatically access with Python. The following is how I tried to access it...
payload = {
'user' : 'myusername',
'pass': 'mypassword'
}
session = requests.session()
r = session.post("http://apps.webofknowledge.com/WOS_CitedReferenceSearch_input.do?SID=1DtxhgpRsI16gPP7tRC&product=WOS&search_mode=CitedReferenceSearch",
data=payload)
print(r.text)
which results in the following output that indicates the redirect is not being properly captured...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><!-- !DOCTYPE HTML PUBLIC "-/W3C/DTD HTML 4.01 Transitional/EN" -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/WoKcommon.css' type="text/css" />
<link rel="stylesheet" href='http://login.webofknowledge.com/error/WOK5/main.css' type="text/css" />
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/jquery.js"></script>
<script language="javascript" src="http://login.webofknowledge.com/error/WOK5/main.js"></script>
<title>Web of Science - Starting New Session...</title>
<script>
function autoredirect() {
var s = "true";
document.cookie = "SID=1; expires=15/02/2000 00:00:00; domain=www.webofknowledge.com";
if (false == s)
{
setTimeout("this.form.submit()", null);
}
else
{
setTimeout("top.location.href='http://www.webofknowledge.com?'", null);
}
}
</script>
</head>
<body id="WoKerror" onload="javascript:autoredirect()">
<form action='http://www.webofknowledge.com'>
<div class="main-container">
<div class="navBar clearfix">
<ul class="userCabinet nav-list">
<li class="nav-item">
<a title="" class="nav-link" href="javascript: void(0)">English <i class="icon-arrow"></i></a>
<ul class="subnav">
<li class="subnav-item">
<a class="subnav-link" title="简体中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_CN"> 简体中文</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="繁體中文" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=zh_TW"> 繁體中文</a>
</li>
<li class="subnav-item language-active-option">
<a class="subnav-link" title="English" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=en_US"> English</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="日本語" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ja"> 日本語</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="한국어" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ko_KR"> 한국어</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="Português" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=pt_BR"> Português</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="Español" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=es_LA"> Español</a>
</li>
<li class="subnav-item">
<a class="subnav-link" title="Pусский" href="http://login.webofknowledge.com/error/Error?DestApp=WOS&SID=1DtxhgpRsI16gPP7tRC&Error=Server.sessionNotFound&locale=ru_RU"> Pусский</a>
</li>
</ul>
</li>
</ul>
</div>
<div class="logoBar">
<h1 class="titleh1"><a href="http://www.webofknowledge.com/"> <span title="Web of Science">Web of Science</span> </a></h1>
<span><img alt="Clarivate Analytics" title="Clarivate Analytics" src="http://login.webofknowledge.com/error/WOK5/images/trlogo.png" /></span>
</div>
<!-- Begin : Module Title Shell -->
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWleftOuterEdge" width="8"></td>
<td class="NEWwokErrorContainer">
<div class="NEWpageTitle"><H1>Thank you for using Web of Science</H1></div>
</td>
<td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge"></td>
</tr>
</tbody>
</table>
<!-- End : Module Title Shell -->
<!-- Begin : WoK Error Shell -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" valign="top">
<tr>
<td class="NEWleftOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" width="8" class="NEWleftOuterEdge" /></td>
<td class="NEWwokErrorContainer SignInLeftColumn ">
<!-- Begin : Error -->
<h2>STARTING A NEW SESSION...</h2>
<p>
<p>If a new session is not started automatically in a few seconds, click <a href="http://www.webofknowledge.com?" target="_top">establish a new session</a>.
<!-- End : Error --></td>
<td class="NEWrightOuterEdge"><img src="http://login.webofknowledge.com/error/WOK5/images/spacer.gif" class="NEWrightOuterEdge" /></td>
</tr>
</table>
<!-- End : WoK Error Shell -->
</form>
<div id="skip-to-footer" class="footer">
<div class="footerContent">
<ul>
<li><span>© 2017</span> <a id="TRcopyright" title="Clarivate Analytics" href="http://clarivate.com" name="Clarivate Analytics" target="_new"> Clarivate Analytics</a></li>
<li><a id="TRpolicy" title="Terms of Use" href="http://wokinfo.com/terms" name="Terms of Use" target="_new"> Terms of Use</a></li>
<li><a id="TRprivacy" title="Privacy Policy" href="http://ip-science.thomsonreuters.com/privacy" name="Privacy Policy" target="_new"> Privacy Policy</a></li>
<li><a id="TRfeedback" title="Feedback" href="http://science.thomsonreuters.com/info/wokfeedback" name="Feedback" target="_new"> Feedback</a></li>
</ul>
</div>
</div>
</body></html>
I have tried first sending a post request, then sending a get request to the url that it redirects to when I am doing the same operation in my browser while passing in the cookies from the post request, but I am getting an empty array for my cookies when I call r.cookies
, so I end up getting the same HTML output as shown above. The issue seems to be that I cannot redirect through Python into the new session the website initiates.