0

We are developing a web scraper type thing, where the user enters a website's url and our web application generates a screenshot of the website. We use phantomjs's rendering for generating the screenshot in PNG format. Although it works like a charm for most cases, some websites are not rendered correctly. For example if you take the website http://dorevi.lt/ it shows like this in the browser:

Browser snapshot (tested on latest chrome and ie)

However the screenshot rendered by phantom is as below:

Image rendered by phantom 2.1

You can see that it stretches the center table and breaks content in between. What I have tried so far is:

  1. tried to put all sorts of delay, even upto 30 seconds, between page read and page render but no luck.

  2. tried all solution from this answer where we wait for DOM content to be loaded (enternal stlysheets etc) , but again same output

  3. Tried to add all possible parameters while executing the phanjomjs script, this is what my final command looks like: phantomjs.exe --ignore-ssl-errors=true --load-images=true --ssl-protocol=any --debug=true --local-to-remote-url-access=true --web-security=false --disk-cache=false script.js

As you can see I have used all possible flags too but still the same output. Please help me on this, as we need to make sure we generate accurate screenshots of the webpages.

Info: Phantomjs version used: 2.1 OS: CentOS for production, also testing on windows 7 with same output Technolog: Application is used building PHP

Edit 1: Adding --debug=true output

2017-12-09T15:31:40 [DEBUG] CookieJar - Created but will not store cookies (use
option '--cookies-file=<filename>' to enable persistent cookie storage)
2017-12-09T15:31:41 [DEBUG] Set  "http"  proxy to:  "" : 1080
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Configuration
2017-12-09T15:31:41 [DEBUG]      0 objectName : ""
2017-12-09T15:31:41 [DEBUG]      1 cookiesFile : ""
2017-12-09T15:31:41 [DEBUG]      2 diskCacheEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      3 maxDiskCacheSize : "-1"
2017-12-09T15:31:41 [DEBUG]      4 diskCachePath : ""
2017-12-09T15:31:41 [DEBUG]      5 ignoreSslErrors : "true"
2017-12-09T15:31:41 [DEBUG]      6 localUrlAccessEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      7 localToRemoteUrlAccessEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      8 outputEncoding : "UTF-8"
2017-12-09T15:31:41 [DEBUG]      9 proxyType : "http"
2017-12-09T15:31:41 [DEBUG]      10 proxy : ":1080"
2017-12-09T15:31:41 [DEBUG]      11 proxyAuth : ":"
2017-12-09T15:31:41 [DEBUG]      12 scriptEncoding : "UTF-8"
2017-12-09T15:31:41 [DEBUG]      13 webSecurityEnabled : "false"
2017-12-09T15:31:41 [DEBUG]      14 offlineStoragePath : ""
2017-12-09T15:31:41 [DEBUG]      15 localStoragePath : ""
2017-12-09T15:31:41 [DEBUG]      16 localStorageDefaultQuota : "-1"
2017-12-09T15:31:41 [DEBUG]      17 offlineStorageDefaultQuota : "-1"
2017-12-09T15:31:41 [DEBUG]      18 printDebugMessages : "true"
2017-12-09T15:31:41 [DEBUG]      19 javascriptCanOpenWindows : "true"
2017-12-09T15:31:41 [DEBUG]      20 javascriptCanCloseWindows : "true"
2017-12-09T15:31:41 [DEBUG]      21 sslProtocol : "any"
2017-12-09T15:31:41 [DEBUG]      22 sslCiphers : "ECDHE-ECDSA-AES128-GCM-SHA256:
ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA:ECD
HE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-RC4-SH
A:ECDHE-RSA-RC4-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:DHE-RSA-AES256-SHA:AES
128-GCM-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:RC4-SHA:RC4-MD5"
2017-12-09T15:31:41 [DEBUG]      23 sslCertificatesPath : ""
2017-12-09T15:31:41 [DEBUG]      24 sslClientCertificateFile : ""
2017-12-09T15:31:41 [DEBUG]      25 sslClientKeyFile : ""
2017-12-09T15:31:41 [DEBUG]      26 sslClientKeyPassphrase : ""
2017-12-09T15:31:41 [DEBUG]      27 webdriver : ":"
2017-12-09T15:31:41 [DEBUG]      28 webdriverLogFile : ""
2017-12-09T15:31:41 [DEBUG]      29 webdriverLogLevel : "INFO"
2017-12-09T15:31:41 [DEBUG]      30 webdriverSeleniumGridHub : ""
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Script & Arguments
2017-12-09T15:31:41 [DEBUG]      script: "script.js"
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Starting normal mode
2017-12-09T15:31:41 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:42 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 30
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 32
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 35
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 37
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 39
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 41
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 43
2017-12-09T15:31:42 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 46
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 48
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 52
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 55
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 58
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 60
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 63
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 67
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 69
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 71
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 74
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 76
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 78
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 81
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 83
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 85
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 87
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gat=1; expires=Sat, 09-Dec-2017
10:02:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:53 [DEBUG] CookieJar - Purged (session) "CMSSESSID8694f4a4=kpca
79mq05g4v0fnh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_gat=1; expires=Sat, 09-Dec-2017
10:02:43 GMT; domain=.dorevi.lt; path=/"
Mustafa sabir
  • 4,130
  • 1
  • 19
  • 28
  • With`phantomjs --debug=true` check to see if there's any error output. Keep in mind PhantomJS is old and quite abandoned so if this website is using some JavaScript syntax that is not supported then it will break. Also check if what you're seeing is actually the page layout for mobile or something like that. If it is you need a larger viewport in your page element – apokryfos Dec 09 '17 at 09:53
  • let me add the debug=true out put to the question – Mustafa sabir Dec 09 '17 at 09:57
  • Try adding a page error handler as well and see if it traps anything. Check http://phantomjs.org/api/phantom/handler/on-error.html – apokryfos Dec 09 '17 at 10:38
  • Same output with `phantom.onError ` – Mustafa sabir Dec 09 '17 at 10:49

1 Answers1

2

PhantomJS 2.1.1 is unfortunately indeed is old and abandoned in favor of headless Chrome's puppeteer

Here's making screenshot in that:

'use strict';

const puppeteer = require('puppeteer');

(async() => {

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.nytimes.com/');
await page.screenshot({path: 'full.png', fullPage: true});
await browser.close();

})();
Vaviloff
  • 16,282
  • 6
  • 48
  • 56