I have been trying to host a django project with selenium inside digital droplet. I installed all the necessary things but I am getting this error:
Service /usr/bin/chromedriver unexpectedly exited. Status code was: 1\n
If I write this command: chromedriver I get this:
Starting ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
This is my chromedriver version:
ChromeDriver 114.0.5735.198 (c3029382d11c5f499e4fc317353a43d411a5ce1c-refs/branch-heads/5735@{#1394})
This is my google-chrome version:
Google Chrome 114.0.5735.198
I have deployed it using nginx gunicorn. The server is running well eveything running well but I am getting error while I send request which uses selenium chromedriver.
Here is a code snippet for this automation.py:
class Scrape:
def find_email_and_phone(self, url):
payloads = {
"company_name": self.remove_and_fetch_name(url),
"email": "",
"links": [],
"numbers": []
}
links = []
driver_location = "/usr/bin/google-chrome"
# driver_service = Service("/chromedriver_linux64/chromedriver")
chrome_options_ = Options()
chrome_options_.add_argument('--verbose')
chrome_options_.add_argument('--headless')
chrome_options_.binary_location = '/usr/bin/google-chrome'
chrome_options_.add_argument('--no-sandbox')
chrome_options_.add_argument('--disable-dev-shm-usage')
chrome_options_.add_argument('')
driver_ = webdriver.Chrome(options=chrome_options_, service=Service(executable_path=driver_location))
try:
driver_.get(url)
page_content = driver_.page_source
email_pattern = re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", page_content)
# links_pattern = re.search(r"")
if email_pattern:
payloads["email"] = email_pattern.group()
links.append(email_pattern.group())
# print(links)
else:
print("No Email Found!")
# finding all social links (searching for linkedin / facebook)
links_pattern = re.findall(r'href=[\'"]?([^\'" >]+)', page_content)
https_links = [link for link in links_pattern if link.startswith("https://")]
filtered_links = []
keywords = ["linkedin"]
for link in https_links:
if any(keyword in link for keyword in keywords):
filtered_links.append(link)
payloads["links"] = [link for link in filtered_links]
# finding phone numbers that are present inside the website
phone_numbers = re.findall(
r'\b(?:\+?\d{1,3}\s*(?:\(\d{1,}\))?)?[.\-\s]?\(?(\d{3})\)?[.\-\s]?(\d{3})[.\-\s]?(\d{4})\b',
page_content)
formatted_phone_numbers = [
f"({number[0]}) {number[1]}-{number[2]}" for number in set(phone_numbers)]
payloads["numbers"] = [number for number in formatted_phone_numbers]
# df = pd.DataFrame([payloads])
# df['numbers'] = df['numbers'].apply(lambda x: ', '.join(x))
# df.to_csv(f"{datetime.now()}.csv", index=False)
return payloads
except Exception as e:
return str(e)
finally:
driver_.quit()
Here is my views.py:
def post(self, request): try: email_and_phone = [] scrap = Scrape() query = request.data.get("query") data = scrap.extract_important_links(query, int(request.data.get("number_of_results"))) for d in data: sc = scrap.find_email_and_phone(d) email_and_phone.append(sc)
for item in email_and_phone:
dataset = DataSet.objects.create(
company_name=item["company_name"],
email=item["email"]
)
for n in item["numbers"]:
numbers = Numbers.objects.create(
number=n
)
dataset.numbers.add(numbers.id)
for li in item["links"]:
links = Links.objects.create(
link=li
)
dataset.links.add(links.id)
return response({
"success": True,
"data": email_and_phone
}, status=status.HTTP_200_OK)
except Exception as e:
return response({
"success": False,
"message": str(e)
}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
I saw a lot of solution from stackoverflow. But couldn't find any solution for me. It runs well when I run the script like this:
python3 automation.py, it doesn't throw any exception also it runs well when I run runserver using this command:
python3 manage.py runser my_ip:8000
But it doesn't work properly when I request it from the server without running runserver command.