When I right-click on a page in my browser, I can "Save Page As", saving the entire webpage including images, css, and js. I've seen questions answered on downloading a page's content, but this only retrieves the HTML. Is there a solution with urllib2, requests, or any other library, to downloading the complete page?
Asked
Active
Viewed 2,692 times
11
-
2with `urllib2`, `requests` you have to find in HTML all urls to images, css, js and download it manually. – furas Feb 03 '17 at 22:47
-
3BTW: `wget -p --convert-links http://www.server.com/dir/page.html` - wikipwdia: [wget](https://en.wikipedia.org/wiki/Wget) – furas Feb 03 '17 at 22:53
-
the thing that you are looking for is more like a selenium \ splash library which "render" the whole website and let you save it (they are using browser drivers to parse the data.) – Omer Shacham Jan 12 '20 at 18:43
1 Answers
4
You can use pyautogui
coupled with selenium
to achieve this.
import time
from selenium import webdriver
import pyautogui
URL = 'https://example.com'
# open page with selenium
# (first need to download Chrome webdriver, or a firefox webdriver, etc)
driver = webdriver.Chrome()
driver.get(URL)
# open 'Save as...' to save html and assets
pyautogui.hotkey('ctrl', 's')
time.sleep(1)
pyautogui.typewrite('your_filename' + '.html')
pyautogui.hotkey('enter')

isopach
- 1,783
- 7
- 31
- 43