2

When I view the source HTML after manually navigating to the site via Chrome I can see the full page source but on loading the page source via selenium I'm not getting the complete page source.

from bs4 import BeautifulSoup
from selenium import webdriver
import sys,time


driver = webdriver.Chrome(executable_path=r"C:\Python27\Scripts\chromedriver.exe")
driver.get('http://www.magicbricks.com/')


driver.find_element_by_id("buyTab").click()

time.sleep(5)
driver.find_element_by_id("keyword").send_keys("Navi Mumbai")

time.sleep(5)
driver.find_element_by_id("btnPropertySearch").click()

time.sleep(30)

content = driver.page_source.encode('utf-8').strip()

soup = BeautifulSoup(content,"lxml")

print soup.prettify()
Morgan Thrapp
  • 9,748
  • 3
  • 46
  • 67
  • Can you add the page source which you are missing out on with webdriver? – Grasshopper Aug 19 '16 at 20:42
  • Have you tried putting a `time.sleep(5)` or some other arbitrary time after the line `driver.get('http://www.magicbricks.com/')`? It could be that the page is simply not loading up quick enough for the component you are looking for to be available. – Michael Platt Aug 19 '16 at 20:51
  • Also, I noticed that the site has a popup that appears as you start using it. Because of this popup I had to click the "btnPropertySearch" button twice. I was able to see all the source code though. Could you elaborate more about what you can't see? – Michael Platt Aug 19 '16 at 21:06

2 Answers2

1

The website is possibly blocking or restricting the user agent for selenium. An easy test is to change the user agent and see if that does it. More info at this question:

Change user agent for selenium driver

Quoting:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")

driver = webdriver.Chrome(chrome_options=opts)
Community
  • 1
  • 1
  • thank you for the tip: options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/94.0.4606.81 Safari/537.36') this piece of code work for me using user-agent – zaheer Dec 16 '21 at 18:10
1

Try something like:

import time
time.sleep(5)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

instead of driver.page_source.

Dynamic web pages are often needed to be rendered by JavaScript.

ghchoi
  • 4,812
  • 4
  • 30
  • 53