driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)
html_source=driver.page_source
html = BeautifulSoup(html_source)
Why is html_source and html different . What am I doing wrong here?
driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)
html_source=driver.page_source
html = BeautifulSoup(html_source)
Why is html_source and html different . What am I doing wrong here?
driver.get
is not like most other get
methods, you only visit the page. You can then obtain the html by using driver.page_source
:
driver = webdriver.Firefox()
driver.maximize_window()
driver.get(url)
soup = BeautifulSoup(driver.page_source)
If you use calling BeautifulSoup just with one parameter, you parse document as an html one. If one tag is not an HTML valid one, its corrected and document will be modified. You can see Beautiful Soup Specifying the parser to use.