Python urllib wget save complete page

Question

I would like to download the Webpage, Complete with urllib or wget or a similar package in python.

The resulting html file is different for the Webpage, Complete than with Webpage, HTML Only which is what wget.download or urllib.request.urlopen seems to be doing.

Example URL where those two html files are different: https://smash.gg/tournament/genesis-6/events/smash-for-switch-singles/brackets/500500/865126.

score 0 · Answer 1 · answered Feb 03 '19 at 22:37

0

You can simulate pressing the CTRL + s, then s to save (found here)

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get('https://smash.gg/tournament/genesis-6/events/smash-for-switch-singles/brackets/500500/865126')

save_me = ActionChains(driver).key_down(Keys.CONTROL).key_down('s').key_up(Keys.CONTROL).key_up('s')
save_me.perform()

answered Feb 03 '19 at 22:37

chitown88

27,527
4
30
59

After running your code, I'm not seeing the downloaded files anywhere. Does the ``Enter`` key need to be pressed or something? – nathanesau Feb 03 '19 at 23:09
Don't think so. I'll look into this more. – chitown88 Feb 04 '19 at 10:09

score 0 · Answer 2 · answered Feb 03 '19 at 23:14

The page you've linked relies very heavily on javascript and more specifically on AJAX requests. wget does not parse Javascript at all, so if there are any links within the JS source that are required, Wget will simply skip over them. This is what is causing the differences you noticed.

You will likely not be able to save this page completely with something like wget or urllib. Since they both work primarily with only HTML sources. Wget can handle CSS as well, but that's about it. For a script heavy page, you need something a lot more complex. If you really want to save it programmatically, you need to go with Selenium.

Python urllib wget save complete page

2 Answers2