Python: Is it possible to download ENTIRE web page in PhantomJS

Question

I have used PhantomJS for scraping purpose. I would like to know about possibility of download all contents of a URL(inclduing Images, CSS and JS) and save locally for browsing?

do you have wget installed ? – throws_exceptions_at_you Jan 13 '17 at 21:10 — throws_exceptions_at_you, Jan 13 '17 at 21:10

score 0 · Answer 1 · answered Jan 09 '17 at 16:00

0

# -*- coding: utf-8 -*-
from selenium import webdriver #for cookies collections after all AJAX/JS being executed
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36")

driver = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any', '--web-security=false'])
driver.set_window_size(1366,768)

driver.get('http://stackoverflow.com')

driver.page_source

This is complete code that uses Python Selenium + PhantomJS and at the end you have complete page source.

answered Jan 09 '17 at 16:00

Umair Ayub

19,358
14
72
146

I said entire page including assets. Question updated – Volatil3 Jan 09 '17 at 16:02
I dont know of that. :( Maybe you have to iterate over all – Umair Ayub Jan 09 '17 at 16:06
I know this'd be last resort, was wondering if some library already exist. – Volatil3 Jan 09 '17 at 16:08
You won't be able to download files with just PhantomJs, the addition to file download support has been requested for ages, but they desn't seem to care. All that said, @Umair aproach could be usefull, with bf4 you could get whatever elements you want and then dowload them using request. – EndermanAPM Jan 13 '17 at 17:16

score 0 · Answer 2 · answered Jan 13 '17 at 16:34

0

we can use evaluate() function to get the content. I use this in nodejs.

var webPage = require('webpage');
var page = webPage.create();

page.open('http://google.com', function(status) {

  var title = page.evaluate(function() {
    return document.title;
  });

  console.log(title);
  phantom.exit();

});`

answered Jan 13 '17 at 16:34

Guru

411
3
20

Well, is it Python? – Volatil3 Jan 13 '17 at 19:08
No dude. Its nodejs. @volatil3 – Guru Jan 14 '17 at 13:15
[click here](http://stackoverflow.com/questions/2753878/how-to-evaluate-javascript-code-in-python). You can check this link. – Guru Jan 14 '17 at 13:18

score 0 · Answer 3 · answered Jan 13 '17 at 21:14

0

In the case of wget being installed, this task is rather easy:

domain = "www.google.de"
from subprocess import call
call(["wget", "-mk", domain])

answered Jan 13 '17 at 21:14

throws_exceptions_at_you

1,626
14
17

Python: Is it possible to download ENTIRE web page in PhantomJS

3 Answers3