2

I am trying to get the html code from a web page but I only get like 1/4 of the page showing.

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.hltv.org/matches")

print(driver.page_source)

It feels like I have tried everything but still get the same result. It doesn't start at the top. It starts far far down, almost at the end.

Anyone got a clue?

Tonechas
  • 13,398
  • 16
  • 46
  • 80
lidas21
  • 25
  • 4
  • Does this answer your question? [Python selenium screen capture not getting whole page](https://stackoverflow.com/questions/26211056/python-selenium-screen-capture-not-getting-whole-page) – avocadoLambda May 17 '20 at 09:16
  • 1
    I dont want to screencapture. I am looking for the html code. – lidas21 May 17 '20 at 09:21

2 Answers2

1

Try the below code. this worked for me

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.hltv.org/matches")
file = open("asd.html", "a", encoding='utf8')
file.write(driver.page_source)
file.close()
YWILLS
  • 128
  • 10
0

It could be because your get has not finished loading the page at the time that your printing is happening.

To fix this you could try waiting for a known element to load before printing.

To wait for an element ("backToLoginDialog" in the example below) to load, adjust your code to be like the following:

from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

# set up driver and page load timeout
driver = webdriver.Chrome()
timeout = 5

# create your "wait" function
def wait_for_load(element_id):
    element_present = EC.presence_of_element_located((By.ID, element_id))
    WebDriverWait(driver, timeout).until(element_present)

driver.get('https://www.hltv.org/matches')
wait_for_load('backToLoginDialog')
print(driver.page_source)
Hayden Eastwood
  • 928
  • 2
  • 10
  • 20
  • Did not get it to work. I am no expert so this code probably works but got a few errors. Thanks anyway. – lidas21 May 17 '20 at 12:16
  • I made an edit (forgot to include WebDriverWait ) in previous version. This worked fine on my system, copying and pasting straight from this post - I also found 'backToLoginDialog" on the page itself to test for the load. Looks like you got a solution in any case though :) – Hayden Eastwood May 17 '20 at 12:23