I'm trying to scrape this page - https://www.g2.com/products/dropbox/reviews But I'm getting detected as soon as the request comes, is there a way around that?
Tried to use Request before that, and also getting detected. *I can't use Scrapy in this project. and I can't find proper info online on how to solve it...
Maybe I need to add custom headers?
the output of the code right now is (The title of the page that tells you that you are detected):
Pardon Our Interruption
Code:
from selenium import webdriver
import selenium as se
def fetch(URL):
options = se.webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-infobars')
options.add_argument('--disable-extensions')
options.add_argument('--profile-directory=Default')
options.add_argument('--incognito')
options.add_argument('--disable-plugins-discovery')
options.add_argument('--start-maximized')
driver = webdriver.Chrome('chromedriver',chrome_options=options)
driver.get(URL)
print(driver.title)
fetch('https://www.g2.com/products/dropbox/reviews')
EDIT: Was able to kind of go around, getting single page, but at a second run, getting detected. code:
def fetch(URL):
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference("browser.privatebrowsing.autostart", True)
browser = webdriver.Firefox(executable_path='geckodriver.exe', firefox_profile=firefox_profile)
browser.get(URL)
print(browser.title)
fetch('https://www.g2.com/products/dropbox/reviews')