11

When I right-click on a page in my browser, I can "Save Page As", saving the entire webpage including images, css, and js. I've seen questions answered on downloading a page's content, but this only retrieves the HTML. Is there a solution with urllib2, requests, or any other library, to downloading the complete page?

Community
  • 1
  • 1
Alon
  • 743
  • 10
  • 23
  • 2
    with `urllib2`, `requests` you have to find in HTML all urls to images, css, js and download it manually. – furas Feb 03 '17 at 22:47
  • 3
    BTW: `wget -p --convert-links http://www.server.com/dir/page.html` - wikipwdia: [wget](https://en.wikipedia.org/wiki/Wget) – furas Feb 03 '17 at 22:53
  • the thing that you are looking for is more like a selenium \ splash library which "render" the whole website and let you save it (they are using browser drivers to parse the data.) – Omer Shacham Jan 12 '20 at 18:43

1 Answers1

4

You can use pyautogui coupled with selenium to achieve this.

import time
from selenium import webdriver
import pyautogui

URL = 'https://example.com'

# open page with selenium
# (first need to download Chrome webdriver, or a firefox webdriver, etc)
driver = webdriver.Chrome()
driver.get(URL)

# open 'Save as...' to save html and assets
pyautogui.hotkey('ctrl', 's')
time.sleep(1)
pyautogui.typewrite('your_filename' + '.html')
pyautogui.hotkey('enter')

Reference

isopach
  • 1,783
  • 7
  • 31
  • 43