2

I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names. Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.

To begin I was following this tutorial. I have completed the tutorial but ran into this error from Lambda:

START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 46, in lambda_handler
    driver = webdriver.Chrome(chrome_options=chrome_options)
  File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
    self.service.start()
  File "/var/task/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

This error was experienced by others and was "resolved" by the author by linking to this stack overflow page. I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.

A couple of changes Ive tried to no avail.

1) Changing the chromedriver and headless-chromium to .exe files
2) Changing this line of code to include the executable_path

driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")

Any help in getting selenium and aws lambda working together would be greatly appreciated.

AG-W
  • 70
  • 1
  • 10
  • Have you added the downloaded chromium files as a part of your deployment package and if so, try changing the path of your driver command to something like this: `driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "./chromedriver.exe")` – Repakula Srushith May 11 '19 at 16:03
  • Sorry for late response but I tried using the "./" and am still receiving the same error – AG-W May 14 '19 at 22:05

2 Answers2

3

I had the same issue and it was due to the binary files being in a location that couldn't execute them. Adding a function to move them, then reading them from that location fixed it. See below example which I just got working while researching this error. (Apologies for the messy code.)

import time
import os
from selenium import webdriver
from fake_useragent import UserAgent

import subprocess
import shutil
import time

BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"

def _init_bin(executable_name):
    start = time.clock()
    if not os.path.exists(BIN_DIR):
        print("Creating bin folder")
        os.makedirs(BIN_DIR)
    print("Copying binaries for " + executable_name + " in /tmp/bin")
    currfile = os.path.join(CURR_BIN_DIR, executable_name)
    newfile = os.path.join(BIN_DIR, executable_name)
    shutil.copy2(currfile, newfile)
    print("Giving new binaries permissions for lambda")
    os.chmod(newfile, 0o775)
    elapsed = time.clock() - start
    print(executable_name + " ready in " + str(elapsed) + "s.")

def handler(event, context):

    _init_bin("headless-chromium")
    _init_bin("chromedriver")

    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--window-size=1280x1696')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--ignore-certificate-errors')

    chrome_options.binary_location = "/tmp/bin/headless-chromium"
    driver = webdriver.Chrome("/tmp/bin/chromedriver", chrome_options=chrome_options)
    driver.get('https://en.wikipedia.org/wiki/Special:Random')
    line = driver.find_element_by_class_name('firstHeading').text
    print(line)
    driver.quit()

    return line

Kyle
  • 321
  • 1
  • 14
  • I'm trying out your solution, but I am still getting a "Message: Can not connect to the Service /tmp/bin/chromedriver" error from selenium. Have you run across this before?What versions of selenium, chromedriver, and headless-chromium are you using? – Brad Root Jul 29 '20 at 21:04
  • I don't recall that error. I'm using the binary and chrome driver that are included if you clone and download https://github.com/ryfeus/lambda-packs/tree/master/Selenium_Chromium/source – Kyle Jul 30 '20 at 22:15
  • I realized I hadn't changed my PYTHONPATH, I think that solved the problem for me. – Brad Root Jul 30 '20 at 22:33
  • I followed your solution. However, I am getting this ```"errorMessage": "[Errno 2] No such file or directory: '/var/task/bin/headless-chromium'",``` after. Do you happen to know the reason? I am running it on AWS lambda, it seems the bin dir doesn't have the executable I am trying to copy to `/tmp` – RobotCharlie Mar 03 '21 at 21:49
  • Yes it does seem like that. Id highly recommend using the AWS SAM to let you get into a lambda like environment on your desktop via docker. It will let you poke around and make sure you have permissions and that the files does exist. Makes deployment much easier too. – Kyle Apr 13 '21 at 23:21
0

I also had the same issue but I have fixed it now. In my case it was the python version was not same on lambda and My Dockerfile.