0

I would like to download the Webpage, Complete with urllib or wget or a similar package in python.

The resulting html file is different for the Webpage, Complete than with Webpage, HTML Only which is what wget.download or urllib.request.urlopen seems to be doing.

enter image description here

Example URL where those two html files are different: https://smash.gg/tournament/genesis-6/events/smash-for-switch-singles/brackets/500500/865126.

nathanesau
  • 1,681
  • 16
  • 27

2 Answers2

0

You can simulate pressing the CTRL + s, then s to save (found here)

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get('https://smash.gg/tournament/genesis-6/events/smash-for-switch-singles/brackets/500500/865126')

save_me = ActionChains(driver).key_down(Keys.CONTROL).key_down('s').key_up(Keys.CONTROL).key_up('s')
save_me.perform()
chitown88
  • 27,527
  • 4
  • 30
  • 59
0

The page you've linked relies very heavily on javascript and more specifically on AJAX requests. wget does not parse Javascript at all, so if there are any links within the JS source that are required, Wget will simply skip over them. This is what is causing the differences you noticed.

You will likely not be able to save this page completely with something like wget or urllib. Since they both work primarily with only HTML sources. Wget can handle CSS as well, but that's about it. For a script heavy page, you need something a lot more complex. If you really want to save it programmatically, you need to go with Selenium.

darnir
  • 4,870
  • 4
  • 32
  • 47