I am writing a script that will save the complete contents of a web page. If I try using urllib2 and bs4 it only writes the contents of the logon page and none of the content after navigating to a search within the page. However, if I do a ctrl + s on the search results page, an html file is saved to disk that when opened in a text editor has all of the contents from the search results.
I've read several posts here on the subject and am trying to use the steps in this one:
How to save "complete webpage" not just basic html using Python
However, after installing geckodriver and setting the sys path variable I continue to get errors. Here is my limited code:
from selenium import webdriver
>>> from selenium.webdriver.common.action_chains import ActionChains
>>> from selenium.webdriver.common.keys import Keys
>>> br = webdriver.Firefox()
Here is the error:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 142, in __init__
self.service.start()
File "C:\Python27\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
And here is where I set the sys path variable:
I've restarted after setting sys path variable.
UPDATE:
I am now trying to use the chromdriver as this seemed more straight forward. I downloaded hromedriver_win32.zip II'm on a windows laptop) from chromedriver's download page, set the environmetal variable path to: C:\Python27\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe
but am getting the similar following error:
>>> br = webdriver.Chrome()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 62, in __init__
self.service.start()
File "C:\Python27\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home