0

This piece of code here allows us to retrieve a set of functional IP addresses to be used as proxy IP addresses using the Selenium Webdriver.

# ProxyIPs.py

from selenium import webdriver
from selenium.webdriver.common.by import By

# Get free proxies for rotating
def get_free_proxies():
    driver = webdriver.Chrome()
    driver.get('https://sslproxies.org')

    table = driver.find_element(By.TAG_NAME, 'table')
    thead = table.find_element(By.TAG_NAME, 'thead').find_elements(By.TAG_NAME, 'th')
    tbody = table.find_element(By.TAG_NAME, 'tbody').find_elements(By.TAG_NAME, 'tr')

    headers = []
    for th in thead:
        headers.append(th.text.strip())

    proxies = []
    for tr in tbody:
        proxy_data = {}
        tds = tr.find_elements(By.TAG_NAME, 'td')
        for i in range(len(headers)):
            proxy_data[headers[i]] = tds[i].text.strip()
        proxies.append("%s:%s" %(proxy_data['IP Address'], proxy_data['Port']))
    driver.quit()
    
    return proxies

Currently, I am utilizing this code for a program that randomly accesses a website through different IP addresses by restarting the driver every 10-15 minutes. This approach is necessary because I need to retrieve data from the site multiple times within every few seconds, and its performance tends to degrade and get slower over time.

Here is an example of the code I have written:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import random
from time import sleep
import ProxyIPs as proxy

chrome_options = Options()

proxies = proxy.get_free_proxies() #From the previous code

PROXY = random.choice(proxies)

chrome_options.add_argument('--proxy-server=http://%s' % PROXY)
driver = webdriver.Chrome(options=chrome_options)
waitWebDriver = WebDriverWait(driver, 60)

driver.get("https://tuludictionary.in/dictionary/cgi-bin/web/frame.html")
driver.maximize_window()

# switch into the search frame
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"frame[name='search']")))

# click on Anywhere radio button
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='SearchType'][value='Anywhere']"))).click()

# Input string in the text box
search_box = driver.find_element(By.CSS_SELECTOR, "input[name='search']")
search_box.clear()
search_box.send_keys('AA')

# Click on the search button
search_button = driver.find_element(By.CSS_SELECTOR, "input[name='sButton']")
search_button.click()

sleep(20)
driver.quit()

While the code is functional, I have encountered an issue with the provided IP addresses. They often do not establish a robust connection with my network, resulting in incomplete loading of frames on the target page. This inconsistency sometimes hinders data retrieval, causing the program to either fail or produce inaccurate results. Since the program will be running for several hours, I require a method to expedite data transfer.

Is there an alternative approach that can enhance speed by utilizing my local IP address itself (at least the page loads even if it becomes slow after multiple accesses), or is there a different way to obtain proxy addresses that offer better functionality?

Rather, is there a way to verify the strength of the connection with an IP address before using it to run the website?

A method to check in advance whether the proxy address possesses a strong connection, if exists, would be nice.

For example: IP Address: 89.147.201.147:3128

This is what the browser looks like when the site doesn't load

Milind
  • 11
  • 8

0 Answers0