I am trying to create an application that scrapes certain e-commerce websites. I am using Selenium for this purpose and trying to deploy my application on an ec2 instance running centos. Before deploying, I developed my code locally and it worked but it gives me errors on the remote machine.
The code that I am using
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)
url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'
selenium_driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)
When I try to run this code on remote machine I get an error with the following stacktrace
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
return self.wsgi_app(environ, start_response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
secondaryPriceXPath=secondaryPriceXPath)
File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
title = self.selenium_driver.find_element(By.XPATH, titleXPath)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
'value': value})['value']
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
(Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>
For debugging purposes, I tried to read the entire body of the webpage using
body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)
which returns
"We're sorry, something has gone wrong. Please try again.\nIf you continue to have trouble, please contact us at support@everlane.com.\nChecking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.\nPlease allow up to 5 seconds…\nDebugging Information\nIP Address\n<ip-address>\nRay ID\n6c57184d797805a0"
I understand that my request might be getting blocked for some reason but is there a way to bypass this?
I have tried adding wait statements in the hope of landing on the redirect but nothing has worked so far.