0

CasperJS version 1.1.0-beta3, using phantomjs version 1.9.8 on OSX 10.10.4 64-bit.

I'm progressing with my casperjs and phantomjs experimentation, but today I hit a mystery. Please see the code that follows

var casper = require('casper').create();
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36');

casper.start('https://www.youtube.com/').viewport(1200, 800);

casper.wait(5000, function(){
casper.echo(casper.getTitle());
});

casper.run();

As you can see, it's a pretty basic getTitle() that typically works on any site or domain; but not on youtube??? I double, triple checked the syntax, tested various URL formats and a multitude of other video streaming and sharing sites; all works.

Except with youtube :\

I figured it would help if I accessed the current HTTP response for youtube

{
"contentType": null,
"headers": [],
"id": 1,
"redirectURL": null,
"stage": "end",
"status": null,
"statusText": null,
"time": "2015-12-16T04:44:28.984Z",
"url": "https://www.youtube.com/",
"data": null
}

Versus Vimeo

{
"contentType": "text/html; charset=UTF-8",
"headers": [way too many to paste all of them here]
"id": 2,
"redirectURL": null,
"stage": "end",
"status": 200,
"statusText": "OK",
"time": "2015-12-16T04:50:23.771Z",
"url": "https://vimeo.com/",
"data": null
}

I enabled verbose:true

This is Youtube

[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: https://www.youtube.com/, HTTP GET
[debug] [phantom] Navigation requested: url=https://www.youtube.com/, type=Other, willNavigate=true, isMainFrame=true
[warning] [phantom] Loading resource failed with status=fail: https://www.youtube.com/
[debug] [phantom] Successfully injected Casper client-side utilities
[debug] [phantom] start page is loaded
[info] [phantom] Step _step 3/4: done in 185ms.
[info] [phantom] Step _step 4/4: done in 300ms.
[info] [phantom] wait() finished waiting for 5000ms.

[info] [phantom] Done 4 steps in 5310ms
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Versus Vimeo

[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: http://www.vimeo.com/, HTTP GET
[debug] [phantom] Navigation requested: url=http://www.vimeo.com/, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Navigation requested: url=https://vimeo.com/, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "https://vimeo.com/"
[debug] [phantom] Navigation requested: url=https://3797665.fls.doubleclick.net/activityi;src=3797665;type=remar853;cat=Gener-;ord=1452090583?, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Successfully injected Casper client-side utilities
[debug] [phantom] start page is loaded
[info] [phantom] Step _step 3/4 https://vimeo.com/ (HTTP 200)
[info] [phantom] Step _step 3/4: done in 1028ms.
[info] [phantom] Step _step 4/4 https://vimeo.com/ (HTTP 200)
[info] [phantom] Step _step 4/4: done in 1132ms.
[info] [phantom] wait() finished waiting for 5000ms.
Vimeo: Watch, upload and share HD videos with no ads
[info] [phantom] Done 4 steps in 6137ms
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

getTitle() was successful with Vimeo.

Since getTitle() is not returning anything but a blank with Youtube, is there an alternative in the casperjs documentation or is there a specific way to casper.start Youtube or is this a new Youtube "anti scraping" measure? If you need further details let me know.

THE SOLUTION FOLLOWS BELOW!!!

First, thank you to @Vaviloff and @ArtjomB. for all their help!

The problem was solved with the suggestion by @ArtjomB.

Try to run it as casperjs --ignore-ssl-errors=true script.js – Artjom B.

The HTTP response from Youtube after applying the suggestion follows

casperjs --ignore-ssl-errors=true quickstart.js
{
    "contentType": "text/html; charset=utf-8",


"headers": [way to many to paste them all here],
    "id": 2,
    "redirectURL": null,
    "stage": "end",
    "status": 200,
    "statusText": "OK",
    "time": "2015-12-16T15:25:48.496Z",
    "url": "https://www.youtube.com/",
    "data": null

And the verbose after applying the suggestion

[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: https://www.youtube.com/, HTTP GET
[debug] [phantom] Navigation requested: url=https://www.youtube.com/, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "https://www.youtube.com/"
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=https://ad.doubleclick.net/N4061/adi/com.ythome/_default;sz=850x250;tile=1;dc_yt=1;kbsg=HPCA151216;kga=-1;kgg=-1;klg=en;kmyd=video-masthead;ytdevice=1;ytexp=9407700,9414823,9414875,9415327,9416485,9421527,9421905,9424442,9425308,9425351,9425784;ord=7805060704704374?, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=https://pubads.g.doubleclick.net/gampad/ads?ad_rule=0&gdfp_req=1&iu=/6762/mkt.ythome_1x1&scp=kbsg=HPCA151216&kga=-1&kgg=-1&klg=en&kmyd=ad_creative_3&ssl=1&ytdevice=1&sz=1x1&correlator=6133483967278153, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=https://www.youtube.com/video_masthead?video_id=Y0tk-WkDvzs&autocrop=1&site_cta=1&textLine1=Nespresso&textLine2=&subscribe_button=0&subscriber_count=0&small_autoplay=0&video_wall=0&list=&autoplay_start_time=0&autoplay_duration=15000&cta_label=Visit our website, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=https://www.youtube.com/embed/Y0tk-WkDvzs?rel=0&mute=1&wmode=opaque&controls=0&showinfo=0&iv_load_policy=3&enablejsapi=1&adformat=1_8&start=0&modestbranding=1&autoplay=1&nologo=1&origin=https://www.youtube.com, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Successfully injected Casper client-side utilities
[debug] [phantom] start page is loaded
[info] [phantom] Step _step 3/4 https://www.youtube.com/ (HTTP 200)
[info] [phantom] Step _step 3/4: done in 3425ms.
[info] [phantom] Step _step 4/4 https://www.youtube.com/ (HTTP 200)
[info] [phantom] Step _step 4/4: done in 3547ms.
[info] [phantom] wait() finished waiting for 5000ms.
YouTube
[info] [phantom] Done 4 steps in 8548ms
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///usr/local/Cellar/casperjs/1.1-beta3/libexec/bin/bootstrap.js. Domains, protocols and ports must match.

Thank you for taking the time to read :)

  • CasperJS 1.1.0-beta3 + PhantomJS 1.9.8 @ Win 7 x64 - your script correctly returns `YouTube`. What are your versions? – Vaviloff Dec 16 '15 at 05:07
  • @Vaviloff I will edit the question to include my versions. Thanks! – WellWellWell Dec 16 '15 at 05:41
  • Have you checked that Youtube is opened by the script at all? Try casper.capture() and also enable verbose and debug output like this: `var casper = require('casper').create({ verbose: true, logLevel: 'debug', pageSettings: { userAgent: 'useragent string' } });` – Vaviloff Dec 16 '15 at 05:57
  • @Vaviloff Thanks for the advice. No, the script does not open Youtube at all. I'll enable verbose and see what I can find out. I'll update the question. Cheers! – WellWellWell Dec 16 '15 at 05:59
  • @Vaviloff I forgot to mention, I did use casper.capture() and it's blank. – WellWellWell Dec 16 '15 at 06:05
  • Try running script from another IP-address (use a server, a proxy, another ISP). Can you access Youtube via normal browser? – Vaviloff Dec 16 '15 at 06:18
  • Please register to the `resource.error`, `page.error`, `remote.message` and `casper.page.onResourceTimeout` events ([Example](https://gist.github.com/artjomb/4cf43d16ce50d8674fdf#file-2_caspererrors-js)). Maybe there are errors. – Artjom B. Dec 16 '15 at 09:48
  • @Vaviloff I enabled verbose:true and it returned one warning and "Unsafe JavaScript". You can see the verbose paste in the original question. I will try the script from a different IP-address and update. Thanks! – WellWellWell Dec 16 '15 at 12:15
  • @ArtjomB. Hello again :) Thanks for stopping by. I'll register to the ressource.error. Thank you for the example. I enabled verbose and pasted the results in the question. It might help determine what's going on. Thanks! – WellWellWell Dec 16 '15 at 12:18
  • Try to run it as `casperjs --ignore-ssl-errors=true script.js` – Artjom B. Dec 16 '15 at 14:12
  • @Vaviloff I just tried the script from a different IP-address, same result. I'll update as soon as I have more info. Thanks! – WellWellWell Dec 16 '15 at 15:19
  • 1
    Possible duplicate of [PhantomJS failing to open HTTPS site](http://stackoverflow.com/questions/12021578/phantomjs-failing-to-open-https-site) – Artjom B. Dec 16 '15 at 15:51
  • @ArtjomB. That solved the problem! I'll post the HTTP response and verbose in the question :) Thank you! I'll look into your solution to make sure I understand it. – WellWellWell Dec 16 '15 at 15:52
  • @ArtjomB. It is similar but the context was different imho. That question is specific to HTTPS, but in my case the original script works with other HTTPS sites, but did not work with Youtube. By the way, this was an interesting answer http://stackoverflow.com/a/24679134/1088340 but --ssl-protocol=any does not work in my case :/ . – WellWellWell Dec 16 '15 at 16:07

0 Answers0