I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names. Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.
To begin I was following this tutorial. I have completed the tutorial but ran into this error from Lambda:
START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 46, in lambda_handler
driver = webdriver.Chrome(chrome_options=chrome_options)
File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
self.service.start()
File "/var/task/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
This error was experienced by others and was "resolved" by the author by linking to this stack overflow page. I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.
A couple of changes Ive tried to no avail.
1) Changing the chromedriver and headless-chromium to .exe files
2) Changing this line of code to include the executable_path
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")
Any help in getting selenium and aws lambda working together would be greatly appreciated.