1
driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)
html_source=driver.page_source   
html = BeautifulSoup(html_source)

Why is html_source and html different . What am I doing wrong here?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142

2 Answers2

2

driver.get is not like most other get methods, you only visit the page. You can then obtain the html by using driver.page_source:

driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)
soup = BeautifulSoup(driver.page_source)
PascalVKooten
  • 20,643
  • 17
  • 103
  • 160
1

If you use calling BeautifulSoup just with one parameter, you parse document as an html one. If one tag is not an HTML valid one, its corrected and document will be modified. You can see Beautiful Soup Specifying the parser to use.

Mihai8
  • 3,113
  • 1
  • 21
  • 31
  • Thanks for info! My question is with respect to this code http://stackoverflow.com/questions/30982176/parse-the-html-code-for-a-whole-webpage-scrolled-down. How do you think I can read the entire html in soup? – Abhishek Bhatia Jun 25 '15 at 20:22