1

driver.page_source don't returns all the source code.It is detaily printing only some parts of code, but it's missing a big part of code. How can i fix this?

This is my code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
  def htmlToLuna():
  url ='https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A'
  driver = webdriver.Chrome('C:\\Python27\\chromedriver\\chromedriver.exe')
  driver.get(url)
  web=open('web.txt','w')
  web.write(driver.page_source)
  print driver.page_source
  web.close()

print htmlToLuna()
iamsankalp89
  • 4,607
  • 2
  • 15
  • 36
Avo Asatryan
  • 404
  • 8
  • 21
  • 1
    I have opened the url in your question. You see the spinner in the page is spinning even after the page is loaded? webdriver wont wait unless you speicfy it. – VISWESWARAN NAGASIVAM Sep 02 '17 at 04:33

1 Answers1

4

Here is a simple code all it does is it opens the url and gets the length page source and waits for five seconds and will get the length of page source again.

if __name__=="__main__":
    browser = webdriver.Chrome()
    browser.get("https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A")
    initial = len(browser.page_source)
    print(initial)
    time.sleep(5)
    new_source = browser.page_source
    print(len(new_source)

see the output: 15722 48800

you see that the length of the page source increases after a wait? you must make sure that the page is fully loaded before getting the source. But this is not a proper implementation since it blindly waits.

Here is a nice way to do this, The browser will wait until the element of your choice is found. Timeout is set for 10 sec.

if __name__=="__main__":
    browser = webdriver.Chrome()
    browser.get("https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A")
    try:
        WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.CodeMirror > div:nth-child(1) > textarea:nth-child(1)')))  # 10 seconds delay
        print("Result:")
        print(len(browser.page_source))
    except TimeoutException:
        print("Your exception message here!")

The output: Result: 52195

Reference:

https://stackoverflow.com/a/26567563/7642415

http://selenium-python.readthedocs.io/locating-elements.html

Hold on! even that wont make any guarantees for getting full page source, since individual elements are loaded dynamically. If the browser finds the element it moves on. So make sure you find the proper element to make sure the page has been loaded fully.

P.S Mine is Python3 & webdriver is in my environment PATH. So my code needs to be modified a bit to make it work for Python 2.x versions. I guess only print statements are to be modified.

  • thank you. does seleneium have option to open a new tab instead of opening the browser again? – Avo Asatryan Sep 02 '17 at 05:19
  • @AvoAsatryan yes it has, Google gives you thousands of answer here is one: https://stackoverflow.com/a/28432939/7642415 and if you are woking with a new url and wont go for the previous url then you may call the webdriver.get method again without closing the browser – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:28
  • Yes, that does the same. It wont close the browser, All it does is passes Cntrl + T to the currently opened browser which opens the new tab – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:31
  • no, i meant . for exaple i opened my Chrome, after it i am runing my script, i want that selenium open a new tab in already opened session which is runing before the run of script – Avo Asatryan Sep 02 '17 at 05:34
  • Let me make it clear: You are requesting an url from the driver, It opens a browser(in our case it's Chrome) and loads the url and again you want to load another url, in the previously opened Chrome browser? Is this you want? I'm not a native English Speaker so please make it clear for me, – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:38
  • i am runing chrome, the program is runing. after it i am runing python script with selenium. selenium is opening a new sesion . but i want that it use already existing sesion – Avo Asatryan Sep 02 '17 at 05:43
  • Can you show me your code? which you can update in your question – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:44
  • import re from selenium import webdriver from selenium.webdriver.common.keys import Keys import os import time def htmlToLuna(): url ='https://codefights.com/tournaments/Xph7eTJQssbXjDLzP/A' driver = webdriver.Chrome('C:\\Python27\\chromedriver\\chromedriver.exe') driver.get(url) time.sleep(5) webhuck=open('webhuck.txt','w') x=driver.page_source x=re.findall("cm-def\"\>\\b(\w{6,})\\b",x,re.M) print x x=" ".join(x) webhuck.write(x) #driver.quit() webhuck.close() print htmlToLuna() – Avo Asatryan Sep 02 '17 at 05:46
  • after runing this code it's opening a new session, but i want that it opened a new tab in alreasy existing session – Avo Asatryan Sep 02 '17 at 05:47
  • you mean you have opened a url in a regular browser(Chrome) initially and the webdriver wants to open the further urls in new tabs of your regular browser rather opening a new session? That is not possible, If you dont want to open any browser but still perform the operation you need a virtual/headless browser like this: http://phantomjs.org/ – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:53
  • i cant such a thing with chrome? – Avo Asatryan Sep 02 '17 at 05:56
  • 1
    To my knowledge No, Try PhantomJS instead. You can open browser virtually. http://phantomjs.org/ – VISWESWARAN NAGASIVAM Sep 02 '17 at 05:58