I am running a small script that is mostly checking small things on a website. Today I've come across a really interesting situation I've never seen before, which is the webpage i'm going to thinks Javascript is disabled. This is only happening in PhantomJS
, but works fine in Chromedriver
. I've even tried changing the driver's headers to ones similiar to Chrome, but still no luck. Is there anyway to get this page to work in PhantomJS
without having to use ChromeDriver
and PyVirtualDisplay
? i'm running the code on Ubuntu Server and would rather not use the extra system resources of having to use them. I've also tried running driver.save_screenshot()
, but it's returning a blank image since there is no content of the page being displayed.
simple code to reproduce the problem:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
service_args = [
'--ignore-ssl-errors=true',
'--ssl-protocol=any'
]
capabilities = dict(DesiredCapabilities.PHANTOMJS)
capabilities['phantomjs.page.settings.userAgent'] = ('Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/60.0.3112.113 Safari/537.36)')
driver = webdriver.PhantomJS(desired_capabilities=capabilities, service_args=service_args)
driver.get(EDIT: URL REMOVED)
print driver.page_source
html response:
<!DOCTYPE html><html class=" no-blobworkers adownload applicationcache no-audiodata no-webaudio no-audio no-lowbattery no-batteryapi no-battery-api blobconstructor blob-constructor canvas todataurljpeg todataurlpng no-todataurlwebp canvastext contenteditable no-contentsecuritypolicy no-contextmenu cookies cors cssanimations backgroundcliptext bgpositionshorthand bgpositionxy bgrepeatround bgrepeatspace backgroundsize bgsizecover borderimage borderradius boxshadow boxsizing csscalc checked csscolumns cubicbezierrange displayrunin display-runin displaytable display-table ellipsis cssfilters flexbox flexboxlegacy no-flexboxtweener fontface generatedcontent cssgradients hsla lastchild cssmask mediaqueries multiplebgs no-objectfit no-object-fit opacity no-overflowscrolling csspointerevents csspositionsticky no-csspseudoanimations csstransitions no-csspseudotransitions cssreflections regions cssremunit cssresize rgba cssscrollbar shapes siblinggeneral subpixelfont no-supports textshadow csstransforms csstransforms3d userselect cssvhunit cssvmaxunit cssvminunit cssvwunit no-wrapflow no-customprotocolhandler no-dart dataview classlist no-createelementattrs no-createelement-attrs dataset no-microdata draganddrop datalistelem details outputelem progressbar meter ruby no-time no-texttrackapi no-track no-emoji no-strictmode no-contains no-devicemotion no-deviceorientation filereader no-filesystem fileinput formattribute no-localizednumber placeholder no-speechinput no-formvalidation fullscreen gamepads no-geolocation hashchange history no-ie8compat sandbox seamless srcdoc indexeddb json olreversed no-mathml no-lowbandwidth eventsource xhr2 xhrresponsetypearraybuffer xhrresponsetypeblob xhrresponsetypedocument no-xhrresponsetypejson xhrresponsetypetext xhrresponsetype notification pagevisibility performance no-pointerevents no-pointerlock postmessage no-quotamanagement requestanimationframe raf scriptasync scriptdefer localstorage sessionstorage websqldatabase no-stylescoped svgclippaths svgfilters inlinesvg smil svg touchevents typedarrays unicode no-userdata no-vibrate no-video no-webintents no-webgl no-getusermedia no-peerconnection websocketsbinary websockets no-framed sharedworkers webworkers no-dataworkers no-exiforientation no-apng no-webplossless no-webp svgasimg datauri" style=""><head>
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<meta http-equiv="CacheControl" content="no-cache">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="shortcut icon" href="data:;base64,iVBORw0KGgo=">
<script>
(function(){
var securemsg;
var dosl7_common;
window["bobcmn"] = "111111101010102000000052000000052000000002324c533c200000096300000000300000000300000006/TSPD/300000008TSPD_101300000005https200000000200000000";
window.cML=!!window.cML;try{(function(){try{var jj,Jj,Lj=1,sj=1,Sj=1,ij=1;for(var Ij=0;Ij<Jj;++Ij)Lj+=2,sj+=2,Sj+=2,ij+=3;jj=Lj+sj+Sj+ij;window.JO===jj&&(window.JO=++jj)}catch(OJ){window.JO=jj}var zJ=!0;function ZJ(J){J&&(zJ=!1,document.cookie="brav=ad");return zJ}function iJ(){}ZJ(window[iJ.name]===iJ);ZJ("function"!==typeof ie9rgb4);ZJ(/\x3c/.test(function(){return"\x3c"})&!/x3d/.test(function(){return"'x3'+'d';"}));
var IJ=window.attachEvent||/mobi/i.test(window["\x6e\x61vi\x67a\x74\x6f\x72"]["\x75\x73e\x72A\x67\x65\x6et"]),ol=+new Date+6E5,_l,Il,jL=setTimeout,JL=IJ?3E4:6E3;function LL(){if(!document.querySelector)return!0;var J=+new Date,O=J>ol;if(O)return ZJ(!1);O=Il&&_l+JL<J;O=ZJ(O);_l=J;Il||(Il=!0,jL(function(){Il=!1},1));return O}LL();var OL=[17795081,27611931586,1558153217];
function zL(J){J="string"===typeof J?J:J.toString(36);var O=window[J];if(!O.toString)return;var Z=""+O;window[J]=function(J,Z){Il=!1;return O(J,Z)};window[J].toString=function(){return Z}}for(var sL=0;sL<OL.length;++sL)zL(OL[sL]);ZJ(!1!==window.cML);
(function(){var J=-1,J={_:++J,oZ:"false"[J],J:++J,Il:"false"[J],oj:++J,L0:"[object Object]"[J],IL:(J[J]+"")[J],JL:++J,iL:"true"[J],Jj:++J,Zj:++J,OZ:"[object Object]"[J],S:++J,ij:++J,oLj:++J,LLj:++J};try{J.il=(J.il=J+"")[J.Zj]+(J.Ol=J.il[J.J])+(J.LZ=(J.ol+"")[J.J])+(!J+"")[J.JL]+(J.zl=J.il[J.S])+(J.ol="true"[J.J])+(J.Jo="true"[J.oj])+J.il[J.Zj]+J.zl+J.Ol+J.ol,J.LZ=J.ol+"true"[J.JL]+J.zl+J.Jo+J.ol+J.LZ,J.ol=J._[J.il][J.il],J.ol(J.ol(J.LZ+'"\\'+J.J+J.Zj+J.J+J.oZ+"\\"+J.Jj+J._+"("+J.zl+"\\"+J.J+J.ij+
J.J+"\\"+J.J+J.S+J._+J.iL+J.Ol+J.oZ+"\\"+J.Jj+J._+"\\"+J.J+J.S+J.ij+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+J.IL+J.Ol+"\\"+J.J+J.S+J.ij+"['\\"+J.J+J.S+J._+J.Il+"\\"+J.J+J.ij+J.J+"false"[J.oj]+J.Ol+J.Il+J.IL+"']\\"+J.Jj+J._+"===\\"+J.Jj+J._+"'\\"+J.J+J.S+J.JL+J.zl+"\\"+J.J+J.S+J.oj+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+"\\"+J.J+J.Jj+J.ij+"')\\"+J.Jj+J._+"{\\"+J.J+J.oj+"\\"+J.J+J.J+"\\"+J.J+J.S+J.S+J.Il+"\\"+J.J+J.S+J.oj+"\\"+J.Jj+J._+J.iL+J.IL+"\\"+J.J+J.S+J.S+J.OZ+"\\"+J.J+J.ij+J.J+J.Jo+"\\"+J.J+J.Zj+J.oj+
"\\"+J.J+J.Zj+J.JL+"\\"+J.J+J.S+J._+"\\"+J.Jj+J._+"=\\"+J.Jj+J._+"\\"+J.J+J.S+J.ij+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+J.IL+J.Ol+"\\"+J.J+J.S+J.ij+"['\\"+J.J+J.S+J._+J.Il+"\\"+J.J+J.ij+J.J+"false"[J.oj]+J.Ol+J.Il+J.IL+"'].\\"+J.J+J.S+J.oj+J.iL+"\\"+J.J+J.S+J._+"false"[J.oj]+J.Il+J.OZ+J.iL+"(/.{"+J.J+","+J.Jj+"}/\\"+J.J+J.Jj+J.ij+",\\"+J.Jj+J._+J.oZ+J.Jo+"\\"+J.J+J.Zj+J.S+J.OZ+J.zl+"\\"+J.J+J.Zj+J.J+J.Ol+"\\"+J.J+J.Zj+J.S+"\\"+J.Jj+J._+"(\\"+J.J+J.ij+J._+")\\"+J.Jj+J._+"{\\"+J.J+J.oj+"\\"+J.J+J.J+
"\\"+J.J+J.J+"\\"+J.J+J.J+"\\"+J.J+J.S+J.oj+J.iL+J.zl+J.Jo+"\\"+J.J+J.S+J.oj+"\\"+J.J+J.Zj+J.S+"\\"+J.Jj+J._+"(\\"+J.J+J.ij+J._+"\\"+J.Jj+J._+"+\\"+J.Jj+J._+"\\"+J.J+J.ij+J._+").\\"+J.J+J.S+J.JL+J.Jo+J.L0+"\\"+J.J+J.S+J.JL+J.zl+"\\"+J.J+J.S+J.oj+"("+J.oj+",\\"+J.Jj+J._+J.Jj+")\\"+J.J+J.oj+"\\"+J.J+J.J+"\\"+J.J+J.J+"});\\"+J.J+J.oj+"}\\"+J.J+J.oj+'"')())()}catch(O){J%=5}})();var SL=82;window.SZ={IZ:"0895a966bc0180002d019416d74a2e28d1f538ef3103146592d8c25a73dda892c7c585714f95500ba8b6beac1b79be4a3d61e8b7a80de2ffe8aa17af5acaa722530af851815bcaab86168951dee7b2ac8413c027a687d99e48318f014124304bb906e86573dd8e328c3b24cadaf832eea48f8634b3c6e9a0f49eee5235a376e326e984f99d888c10"};function l(J){return 812>J}
function L(J){var O=arguments.length,Z=[];for(var S=1;S<O;++S)Z.push(arguments[S]-J);return String.fromCharCode.apply(String,Z)}function z(J,O){J+=O;return J.toString(36)}(function(J){J||setTimeout(function(){if(!LL())return;var J=setTimeout(function(){},250);for(var Z=0;Z<=J;++Z)clearTimeout(Z);LL()},500)})(zJ);})();}catch(x){document.cookie='brav=oex'+x;}finally{ie9rgb4=void(0);};function ie9rgb4(a,b){return a>>b>>0};
})();
</script>
<script type="text/javascript" src="/TSPD/08e841a5c5ab20002a3554b194594e5f3375d2f994ac4de334932487e4817509e84bbe3658582b13?type=9"></script>
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>
EDIT: Yes, I understand that PhantomJS is old, but it's what we have to use. My question is about getting something to work in PhantomJS, not about what alternatives are available. All of our servers run Ubuntu Server, including all server images, so we have to use headless browsing. Virtual Displays, such as PyVirtualDisplay and any other Xvfb routing method are too heavy on system resources. All of our codebase uses PhantomJS so as of right now I have to use it. As well, we use proxies with username and password authentication which Chrome has not supported, so the Headless Chrome option is out.
As well, i just tested this code with Headless Chrome and it also is not working.
python code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='/path/to/chromedriver', chrome_options=chrome_options)
driver.get("https://www.EDIT-REMOVED.com")
print driver.page_source
driver.quit()
html response:
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>