1

I am running a small script that is mostly checking small things on a website. Today I've come across a really interesting situation I've never seen before, which is the webpage i'm going to thinks Javascript is disabled. This is only happening in PhantomJS, but works fine in Chromedriver. I've even tried changing the driver's headers to ones similiar to Chrome, but still no luck. Is there anyway to get this page to work in PhantomJS without having to use ChromeDriver and PyVirtualDisplay? i'm running the code on Ubuntu Server and would rather not use the extra system resources of having to use them. I've also tried running driver.save_screenshot(), but it's returning a blank image since there is no content of the page being displayed.

simple code to reproduce the problem:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

service_args = [
    '--ignore-ssl-errors=true',
    '--ssl-protocol=any'
]

capabilities = dict(DesiredCapabilities.PHANTOMJS)
capabilities['phantomjs.page.settings.userAgent'] = ('Mozilla/5.0 (X11; Linux x86_64) '
                                                     'AppleWebKit/537.36 (KHTML, like Gecko) '
                                                     'Chrome/60.0.3112.113 Safari/537.36)')

driver = webdriver.PhantomJS(desired_capabilities=capabilities, service_args=service_args)
driver.get(EDIT: URL REMOVED)
print driver.page_source

html response:

<!DOCTYPE html><html class=" no-blobworkers adownload applicationcache no-audiodata no-webaudio no-audio no-lowbattery no-batteryapi no-battery-api blobconstructor blob-constructor canvas todataurljpeg todataurlpng no-todataurlwebp canvastext contenteditable no-contentsecuritypolicy no-contextmenu cookies cors cssanimations backgroundcliptext bgpositionshorthand bgpositionxy bgrepeatround bgrepeatspace backgroundsize bgsizecover borderimage borderradius boxshadow boxsizing csscalc checked csscolumns cubicbezierrange displayrunin display-runin displaytable display-table ellipsis cssfilters flexbox flexboxlegacy no-flexboxtweener fontface generatedcontent cssgradients hsla lastchild cssmask mediaqueries multiplebgs no-objectfit no-object-fit opacity no-overflowscrolling csspointerevents csspositionsticky no-csspseudoanimations csstransitions no-csspseudotransitions cssreflections regions cssremunit cssresize rgba cssscrollbar shapes siblinggeneral subpixelfont no-supports textshadow csstransforms csstransforms3d userselect cssvhunit cssvmaxunit cssvminunit cssvwunit no-wrapflow no-customprotocolhandler no-dart dataview classlist no-createelementattrs no-createelement-attrs dataset no-microdata draganddrop datalistelem details outputelem progressbar meter ruby no-time no-texttrackapi no-track no-emoji no-strictmode no-contains no-devicemotion no-deviceorientation filereader no-filesystem fileinput formattribute no-localizednumber placeholder no-speechinput no-formvalidation fullscreen gamepads no-geolocation hashchange history no-ie8compat sandbox seamless srcdoc indexeddb json olreversed no-mathml no-lowbandwidth eventsource xhr2 xhrresponsetypearraybuffer xhrresponsetypeblob xhrresponsetypedocument no-xhrresponsetypejson xhrresponsetypetext xhrresponsetype notification pagevisibility performance no-pointerevents no-pointerlock postmessage no-quotamanagement requestanimationframe raf scriptasync scriptdefer localstorage sessionstorage websqldatabase no-stylescoped svgclippaths svgfilters inlinesvg smil svg touchevents typedarrays unicode no-userdata no-vibrate no-video no-webintents no-webgl no-getusermedia no-peerconnection websocketsbinary websockets no-framed sharedworkers webworkers no-dataworkers no-exiforientation no-apng no-webplossless no-webp svgasimg datauri" style=""><head>
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<meta http-equiv="CacheControl" content="no-cache">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="shortcut icon" href="data:;base64,iVBORw0KGgo=">

<script>

(function(){
    var securemsg;
    var dosl7_common;

window["bobcmn"] = "111111101010102000000052000000052000000002324c533c200000096300000000300000000300000006/TSPD/300000008TSPD_101300000005https200000000200000000";

window.cML=!!window.cML;try{(function(){try{var jj,Jj,Lj=1,sj=1,Sj=1,ij=1;for(var Ij=0;Ij<Jj;++Ij)Lj+=2,sj+=2,Sj+=2,ij+=3;jj=Lj+sj+Sj+ij;window.JO===jj&&(window.JO=++jj)}catch(OJ){window.JO=jj}var zJ=!0;function ZJ(J){J&&(zJ=!1,document.cookie="brav=ad");return zJ}function iJ(){}ZJ(window[iJ.name]===iJ);ZJ("function"!==typeof ie9rgb4);ZJ(/\x3c/.test(function(){return"\x3c"})&!/x3d/.test(function(){return"'x3'+'d';"}));
var IJ=window.attachEvent||/mobi/i.test(window["\x6e\x61vi\x67a\x74\x6f\x72"]["\x75\x73e\x72A\x67\x65\x6et"]),ol=+new Date+6E5,_l,Il,jL=setTimeout,JL=IJ?3E4:6E3;function LL(){if(!document.querySelector)return!0;var J=+new Date,O=J>ol;if(O)return ZJ(!1);O=Il&&_l+JL<J;O=ZJ(O);_l=J;Il||(Il=!0,jL(function(){Il=!1},1));return O}LL();var OL=[17795081,27611931586,1558153217];
function zL(J){J="string"===typeof J?J:J.toString(36);var O=window[J];if(!O.toString)return;var Z=""+O;window[J]=function(J,Z){Il=!1;return O(J,Z)};window[J].toString=function(){return Z}}for(var sL=0;sL<OL.length;++sL)zL(OL[sL]);ZJ(!1!==window.cML);
(function(){var J=-1,J={_:++J,oZ:"false"[J],J:++J,Il:"false"[J],oj:++J,L0:"[object Object]"[J],IL:(J[J]+"")[J],JL:++J,iL:"true"[J],Jj:++J,Zj:++J,OZ:"[object Object]"[J],S:++J,ij:++J,oLj:++J,LLj:++J};try{J.il=(J.il=J+"")[J.Zj]+(J.Ol=J.il[J.J])+(J.LZ=(J.ol+"")[J.J])+(!J+"")[J.JL]+(J.zl=J.il[J.S])+(J.ol="true"[J.J])+(J.Jo="true"[J.oj])+J.il[J.Zj]+J.zl+J.Ol+J.ol,J.LZ=J.ol+"true"[J.JL]+J.zl+J.Jo+J.ol+J.LZ,J.ol=J._[J.il][J.il],J.ol(J.ol(J.LZ+'"\\'+J.J+J.Zj+J.J+J.oZ+"\\"+J.Jj+J._+"("+J.zl+"\\"+J.J+J.ij+
J.J+"\\"+J.J+J.S+J._+J.iL+J.Ol+J.oZ+"\\"+J.Jj+J._+"\\"+J.J+J.S+J.ij+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+J.IL+J.Ol+"\\"+J.J+J.S+J.ij+"['\\"+J.J+J.S+J._+J.Il+"\\"+J.J+J.ij+J.J+"false"[J.oj]+J.Ol+J.Il+J.IL+"']\\"+J.Jj+J._+"===\\"+J.Jj+J._+"'\\"+J.J+J.S+J.JL+J.zl+"\\"+J.J+J.S+J.oj+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+"\\"+J.J+J.Jj+J.ij+"')\\"+J.Jj+J._+"{\\"+J.J+J.oj+"\\"+J.J+J.J+"\\"+J.J+J.S+J.S+J.Il+"\\"+J.J+J.S+J.oj+"\\"+J.Jj+J._+J.iL+J.IL+"\\"+J.J+J.S+J.S+J.OZ+"\\"+J.J+J.ij+J.J+J.Jo+"\\"+J.J+J.Zj+J.oj+
"\\"+J.J+J.Zj+J.JL+"\\"+J.J+J.S+J._+"\\"+J.Jj+J._+"=\\"+J.Jj+J._+"\\"+J.J+J.S+J.ij+"\\"+J.J+J.Zj+J.J+"\\"+J.J+J.Zj+J.S+J.IL+J.Ol+"\\"+J.J+J.S+J.ij+"['\\"+J.J+J.S+J._+J.Il+"\\"+J.J+J.ij+J.J+"false"[J.oj]+J.Ol+J.Il+J.IL+"'].\\"+J.J+J.S+J.oj+J.iL+"\\"+J.J+J.S+J._+"false"[J.oj]+J.Il+J.OZ+J.iL+"(/.{"+J.J+","+J.Jj+"}/\\"+J.J+J.Jj+J.ij+",\\"+J.Jj+J._+J.oZ+J.Jo+"\\"+J.J+J.Zj+J.S+J.OZ+J.zl+"\\"+J.J+J.Zj+J.J+J.Ol+"\\"+J.J+J.Zj+J.S+"\\"+J.Jj+J._+"(\\"+J.J+J.ij+J._+")\\"+J.Jj+J._+"{\\"+J.J+J.oj+"\\"+J.J+J.J+
"\\"+J.J+J.J+"\\"+J.J+J.J+"\\"+J.J+J.S+J.oj+J.iL+J.zl+J.Jo+"\\"+J.J+J.S+J.oj+"\\"+J.J+J.Zj+J.S+"\\"+J.Jj+J._+"(\\"+J.J+J.ij+J._+"\\"+J.Jj+J._+"+\\"+J.Jj+J._+"\\"+J.J+J.ij+J._+").\\"+J.J+J.S+J.JL+J.Jo+J.L0+"\\"+J.J+J.S+J.JL+J.zl+"\\"+J.J+J.S+J.oj+"("+J.oj+",\\"+J.Jj+J._+J.Jj+")\\"+J.J+J.oj+"\\"+J.J+J.J+"\\"+J.J+J.J+"});\\"+J.J+J.oj+"}\\"+J.J+J.oj+'"')())()}catch(O){J%=5}})();var SL=82;window.SZ={IZ:"0895a966bc0180002d019416d74a2e28d1f538ef3103146592d8c25a73dda892c7c585714f95500ba8b6beac1b79be4a3d61e8b7a80de2ffe8aa17af5acaa722530af851815bcaab86168951dee7b2ac8413c027a687d99e48318f014124304bb906e86573dd8e328c3b24cadaf832eea48f8634b3c6e9a0f49eee5235a376e326e984f99d888c10"};function l(J){return 812>J}
function L(J){var O=arguments.length,Z=[];for(var S=1;S<O;++S)Z.push(arguments[S]-J);return String.fromCharCode.apply(String,Z)}function z(J,O){J+=O;return J.toString(36)}(function(J){J||setTimeout(function(){if(!LL())return;var J=setTimeout(function(){},250);for(var Z=0;Z<=J;++Z)clearTimeout(Z);LL()},500)})(zJ);})();}catch(x){document.cookie='brav=oex'+x;}finally{ie9rgb4=void(0);};function ie9rgb4(a,b){return a>>b>>0};

})();

</script>

<script type="text/javascript" src="/TSPD/08e841a5c5ab20002a3554b194594e5f3375d2f994ac4de334932487e4817509e84bbe3658582b13?type=9"></script>
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>

EDIT: Yes, I understand that PhantomJS is old, but it's what we have to use. My question is about getting something to work in PhantomJS, not about what alternatives are available. All of our servers run Ubuntu Server, including all server images, so we have to use headless browsing. Virtual Displays, such as PyVirtualDisplay and any other Xvfb routing method are too heavy on system resources. All of our codebase uses PhantomJS so as of right now I have to use it. As well, we use proxies with username and password authentication which Chrome has not supported, so the Headless Chrome option is out.

As well, i just tested this code with Headless Chrome and it also is not working.

python code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='/path/to/chromedriver', chrome_options=chrome_options)
driver.get("https://www.EDIT-REMOVED.com")
print driver.page_source
driver.quit()

html response:

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
crookedleaf
  • 2,118
  • 4
  • 16
  • 38
  • Is there any reason you wouldn't want to use headless Chrome? You don't need PyVirtualDisplay or similar to use it. PhantomJS is starting to get a little outdated as it's no longer being updated and the latest version is almost two years old. – Dean W. Oct 31 '17 at 20:19
  • even chrome in headless mode uses a lot more resources than phantomjs. granted, it's better than chrome through pyvirtualdisplay, but still. all of our central libraries use phantomjs, as well as all of our virtual machine deployments and images. it would be a long process to have everything converted over even if system resources weren't an issue. phantomjs actually is still actively being developed and updated despite the time since it's last major release. you can checkout the master branch via git and build from there. phantomjs is still reliable, tried, tested, and true. – crookedleaf Oct 31 '17 at 20:29
  • @crookedleaf PhantomJs is still being developed? I remember the original developer saying that it was dead now that it Chrome headless came around or something along those lines and there is still 1970 issues pending. That's a lot... – Tetora Oct 31 '17 at 21:11
  • @HaydenDarcy yes, most recent commits to the master branch was 4 months ago. devs more than likely are working on non-public branches until they merge them into master. i haven't heard anything about the dev saying it's dead. also, in all honesty, a very VERY good amount of those open issues are people asking questions on how to do things, or having issues. they aren't all bugs. i could post my question onto their issues section and it would just add onto it. – crookedleaf Oct 31 '17 at 21:18
  • regardless of everything, this is a question about Selenium and PhantomJS. with all due respect, telling me to use a different browser or talking about lack of development is not helpful. you may as well tell me to be using Java instead of Python, or Windows instead of Linux. – crookedleaf Oct 31 '17 at 21:21
  • @crookedleaf Haha yeah sorry, I hope you find out how to resolve this. Not everyone is aware that phantomjs can have issues so sorry if I'm preaching to the choir. Chrome headless works great so I don't see why you'd use headless except in your case it's a bit hard to change. Sorry I can't be of much help here, good luck. I'd recommend vfxb or if windows headless windows but you're pretty set on phantomjs :). – Tetora Oct 31 '17 at 21:25
  • @HaydenDarcy lol no worries. i just ead the google group post from the PhantomJS developer talking about stepping down. moving away from PhantomJS will have to be something considered into future changes, but it's something that, as said, is impossible right now. We run a bunch of Ubuntu Server VM's, so headless is absolutely required. Chrome in a virtual display, namely PyVirtualDisplay (which runs using vfxb), is too resource intensive. "Headless Chrome" is something that i will pass on to be looked into, but yeah... as of right now we are stuck with PhantomJS – crookedleaf Oct 31 '17 at 21:30
  • @HaydenDarcy and oh yeah, i'm aware of all the troubles PhantomJS can have, especially with randomly deciding it doesn't want to carry a connection anymore. If i could live without seeing another broken pipe exception, i'd be so happy haha – crookedleaf Oct 31 '17 at 21:32
  • also to everyone, Headless Chrome may never be an option due to Chrome's lack of support for proxy authentication. we need to be able to provide a proxy ip, proxy port, and authentication username and password. currently using Chrome through Selenium has made this impossible. – crookedleaf Oct 31 '17 at 21:45
  • 1
    The issue is not with PhantomJS. The page is protected against automation/bots with a script from https://devcentral.f5.com – Florent B. Nov 01 '17 at 00:07
  • @FlorentB. ah, interesting. i've never come across a site that works through non-headless browsing but doesn't through headless browsing, especially when the UserAgent is switched. i also didn't see this anywhere in the page's source or network calls so i totally overlooked that. do you know where in the page that script is? i need to be able to add in something that pretty much says "if site has script, ignore site". we have something that parses through a list of sites, so having it ignore ones would be helpful. – crookedleaf Nov 01 '17 at 00:41
  • 1
    The script in question is in the page before the redirection in ` – Florent B. Nov 01 '17 at 00:53
  • Florent B. thank you. seems like it'd be much easier to just pull that site out of our list than to build in handling for seeing scripts like that. it'd be nice if they had a robots.txt file like most sites so i could just parse that and skip the site. – crookedleaf Nov 01 '17 at 01:09

0 Answers0