0

I am new to python and I'm trying to scrape a table and from a website that has multiple pages. How should I try to make my code use .click() and where should the code be placed to get a dynamic scrape of the table.

The website I'm trying is https://free-proxy-list.net/ and I'm able to get the table from the first page. I'm trying to get all the pages and put them into a pandas data frame. I have already put the info from the table into a dictionary and tries to put the dict inside a dataframe. However just the first page is able to be inserted into the dataframe. I need all the data from the other pages as well

nam
  • 116
  • 1
  • 10
  • Found the answer here https://stackoverflow.com/questions/42304447/pandas-dataframe-from-list-of-lists-of-dicts – nam Aug 11 '19 at 02:17

1 Answers1

1
  1. Initialized empty list.
  2. Use while loop with condition to check max_page count and iterate the loop.
  3. Append the list in each page iteration.
  4. added the list into pandas Dataframe.
  5. Import the entire data into CSV file.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd

driver=webdriver.Chrome()
driver.get('https://free-proxy-list.net/')

page=1
max_page=15
IP=[]
Port=[]
Code=[]
Country=[]
Anonymity=[]
Google=[]
Https=[]
LastCheck=[]
while page<=max_page:

 rows= WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='proxylisttable']/tbody//tr")))
 for row in rows:
    IP.append(row.find_element_by_xpath('./td[1]').text)
    Port.append(row.find_element_by_xpath('./td[2]').text)
    Code.append(row.find_element_by_xpath('./td[3]').text)
    Country.append(row.find_element_by_xpath('./td[4]').get_attribute('textContent'))
    Anonymity.append(row.find_element_by_xpath('./td[5]').text)
    Google.append(row.find_element_by_xpath('./td[6]').get_attribute('textContent'))
    Https.append(row.find_element_by_xpath('./td[7]').text)
    LastCheck.append(row.find_element_by_xpath('./td[8]').get_attribute('textContent'))


 WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@aria-controls='proxylisttable' and text()='Next']"))).click()
 page=page+1
 print('navigate to page: ' + str(page))

driver.close()

df=pd.DataFrame({"IP":IP,"Port":Port,"Code":Code,"Country":Country,"Anonymity":Anonymity,"Google":Google,"Https":Https,"Last_Checked":LastCheck})
print(df)
df.to_csv('output_IP.csv',index=False)

KunduK
  • 32,888
  • 5
  • 17
  • 41
  • Thanks! I already tried to put all my data into a dictionary and managed to scrape the pages :) Thanks for your answer! – nam Aug 13 '19 at 01:35