1

I have been trying to host a django project with selenium inside digital droplet. I installed all the necessary things but I am getting this error:

Service /usr/bin/chromedriver unexpectedly exited. Status code was: 1\n

If I write this command: chromedriver I get this:

Starting ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.

This is my chromedriver version:

ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394})

This is my google-chrome version:

Google Chrome 114.0.5735.198 

I have deployed it using nginx gunicorn. The server is running well eveything running well but I am getting error while I send request which uses selenium chromedriver.

Here is a code snippet for this automation.py:

class Scrape:
    def find_email_and_phone(self, url):
        payloads = {
            "company_name": self.remove_and_fetch_name(url),
            "email": "",
            "links": [],
            "numbers": []
        }
        links = []
        driver_location = "/usr/bin/google-chrome"
        # driver_service = Service("/chromedriver_linux64/chromedriver")
        chrome_options_ = Options()
        chrome_options_.add_argument('--verbose')
        chrome_options_.add_argument('--headless')
        chrome_options_.binary_location = '/usr/bin/google-chrome'
        chrome_options_.add_argument('--no-sandbox')
        chrome_options_.add_argument('--disable-dev-shm-usage')
        chrome_options_.add_argument('')
        driver_ = webdriver.Chrome(options=chrome_options_, service=Service(executable_path=driver_location))
        try:
            driver_.get(url)
            page_content = driver_.page_source
            email_pattern = re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", page_content)
            # links_pattern = re.search(r"")
            if email_pattern:
                payloads["email"] = email_pattern.group()
                links.append(email_pattern.group())
                # print(links)
            else:
                print("No Email Found!")

            # finding all social links (searching for linkedin / facebook)
            links_pattern = re.findall(r'href=[\'"]?([^\'" >]+)', page_content)
            https_links = [link for link in links_pattern if link.startswith("https://")]
            filtered_links = []
            keywords = ["linkedin"]
            for link in https_links:
                if any(keyword in link for keyword in keywords):
                    filtered_links.append(link)
            payloads["links"] = [link for link in filtered_links]

            # finding phone numbers that are present inside the website

            phone_numbers = re.findall(
                r'\b(?:\+?\d{1,3}\s*(?:\(\d{1,}\))?)?[.\-\s]?\(?(\d{3})\)?[.\-\s]?(\d{3})[.\-\s]?(\d{4})\b',
                page_content)
            formatted_phone_numbers = [
                f"({number[0]}) {number[1]}-{number[2]}" for number in set(phone_numbers)]

            payloads["numbers"] = [number for number in formatted_phone_numbers]

            # df = pd.DataFrame([payloads])
            # df['numbers'] = df['numbers'].apply(lambda x: ', '.join(x))
            # df.to_csv(f"{datetime.now()}.csv", index=False)

            return payloads

        except Exception as e:
            return str(e)

        finally:
            driver_.quit()

Here is my views.py:

def post(self, request): try: email_and_phone = [] scrap = Scrape() query = request.data.get("query") data = scrap.extract_important_links(query, int(request.data.get("number_of_results"))) for d in data: sc = scrap.find_email_and_phone(d) email_and_phone.append(sc)

    for item in email_and_phone:
        dataset = DataSet.objects.create(
            company_name=item["company_name"],
            email=item["email"]
        )
        for n in item["numbers"]:
            numbers = Numbers.objects.create(
                number=n
            )
            dataset.numbers.add(numbers.id)
        for li in item["links"]:
            links = Links.objects.create(
                link=li
            )
            dataset.links.add(links.id)

    return response({
        "success": True,
        "data": email_and_phone
    }, status=status.HTTP_200_OK)
except Exception as e:
    return response({
        "success": False,
        "message": str(e)
    }, status=status.HTTP_500_INTERNAL_SERVER_ERROR)

I saw a lot of solution from stackoverflow. But couldn't find any solution for me. It runs well when I run the script like this:

python3 automation.py, it doesn't throw any exception also it runs well when I run runserver using this command:

python3 manage.py runser my_ip:8000

But it doesn't work properly when I request it from the server without running runserver command.

Sakib ovi
  • 537
  • 3
  • 19

1 Answers1

1

Using the and the Service argument you no more need to pass the executable_path key.

So your effective line of code will be:

driver_location = "/chromedriver_linux64/chromedriver"
driver_ = webdriver.Chrome(options=chrome_options_, service=Service(driver_location))

However using Selenium v4.6 and above Selenium Manager would take care of the chromedriver binary. So your effective code block will be:

chrome_options_ = Options()
chrome_options_.add_argument('--verbose')
chrome_options_.add_argument('--headless')
chrome_options_.binary_location = '/usr/bin/google-chrome'
chrome_options_.add_argument('--no-sandbox')
chrome_options_.add_argument('--disable-dev-shm-usage')
chrome_options_.add_argument('')
driver_ = webdriver.Chrome(options=chrome_options_)
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352