3

I am writing a script in Python to monitor the change of a website. The aim is, once an element in the page is updated (e.g. a button from non-existent to existent), I'll receive a notification. I don't need to login to an account or something on the website. Because I don't have too much knowledge in web development, I just found some code and modifies to meet my need. Basically it looks like this:

import time
import datetime
import random
from selenium import webdriver
from fake_useragent import UserAgent
from selenium.webdriver.support.wait import WebDriverWait

screen_dims = [(375, 667), (411, 731), (360, 640), (414, 736), (375, 812),
               (768, 1024), (1024, 1366), (540, 720)]

def main():
    while (True):
        ua = UserAgent()
        user_agent = ua.random

        options = webdriver.ChromeOptions()
        options.add_experimental_option("excludeSwitches",
                                        ["enable-automation"])
        options.add_experimental_option('useAutomationExtension', False)
        options.add_argument('disable-infobars')
        options.add_argument(f'user-agent={user_agent}')
        driver = webdriver.Chrome(chrome_options=options)

        set_viewport_size(driver)
        driver.get(a_url_to_the_page_of_interest)

        available = check_availability(driver)
        if (available):
            print("Found")
            break
        else:
            driver.quit()
            time.sleep(10)
            continue


def set_viewport_size(driver):
    width, height = random.choice(screen_dims)
    window_size = driver.execute_script(
        """
        return [window.outerWidth - window.innerWidth + arguments[0],
        window.outerHeight - window.innerHeight + arguments[1]];
        """, width, height)
    driver.set_window_size(*window_size)


def check_availability(driver):
    try:
        if (driver.find_element_by_id("privacy-button-id")):
            driver.find_element_by_id("privacy-button-id").click()
    except:
        pass

    try:
        if (driver.find_element_by_id("some-other-button")):
            return True
    except:
        return False

The problem is, after the 3rd or 4th iteration in the main() loop, the website that I monitor will direct me to a Captcha page (due to frequent refreshing, I guess).

I tried several methods that I can find, like fake user-agent, different viewport size, extend the refresh frequency (wait 10s between each refresh), but none of them works.

Some stackoverflow posts I read and tried are like: this, this, and this

I don't want to interact with the captcha directly. I just want to avoid it. What I can think of is to use different IPs to send every request. However, 1. I don't know if this is helpful, 2. if it is, how can I implement this?

Are there any other choices?

Thank you for your help!

tmsh
  • 71
  • 1
  • 9

0 Answers0