-1

** I want to grab all the united states proxies from the site https://sslproxies.org/ I have grabbed all rows but unable to pick only those proxy records which have united states country then I want to grab individual USA proxy with respective port and save them **

from bs4 import BeautifulSoup as bs

# loading web page
r = requests.get("https://sslproxies.org/")
# convert to a beautiful-soup object
webpage = bs(r.content, "html.parser")
rows = iter(webpage.find('table').find_all('tr'))
for row in rows:
    for cell in row.find_all('td'):
        print(cell)
    print() ```
hina fraz
  • 21
  • 4
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jun 21 '22 at 13:17

1 Answers1

0

It might be easier to directly use pandas to read and extract the table, as described here.

Otherwise, you can check if any cell in a row contains "United States" and then print data from the first two columns for this row:

for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    if "United States" in cols:
        data.append([ele for ele in cols if ele])

Complete code:

from bs4 import BeautifulSoup as bs
import requests

# loading web page
r = requests.get("https://sslproxies.org/")
# convert to a beautiful-soup object
webpage = bs(r.content, "html.parser")

data = list()

rows = iter(webpage.find('table').find_all('tr'))
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    if "United States" in cols:
        data.append([ele for ele in cols if ele])

print(data)

The object data will contain all the information you need:

                 IP  Port Code        Country    Anonymity Google Https         Last Checked
0       168.8.172.2    80   US  United States  elite proxy     no   yes          17 secs ago
1     20.47.108.204  8888   US  United States    anonymous     no   yes          10 mins ago
2     68.65.184.223  8888   US  United States    anonymous    yes   yes  3 hours 11 mins ago
3       3.82.203.47  3128   US  United States    anonymous     no   yes  4 hours 38 mins ago
4     20.84.106.205  8214   US  United States  elite proxy    yes   yes  5 hours 40 mins ago
5   150.136.139.194  3128   US  United States  elite proxy    yes   yes  6 hours 30 mins ago
6     172.104.24.22  3128   US  United States    anonymous     no   yes  7 hours 40 mins ago
7     159.65.69.186  9300   US  United States    anonymous     no   yes  7 hours 41 mins ago
8       47.252.4.64  8888   US  United States    anonymous     no   yes  9 hours 34 mins ago
9    12.144.254.185  9080   US  United States    anonymous     no   yes  9 hours 34 mins ago
10    66.94.116.111  3128   US  United States    anonymous     no   yes  9 hours 34 mins ago
11     35.170.197.3  8888   US  United States    anonymous     no   yes  9 hours 34 mins ago
leosh
  • 878
  • 9
  • 22
  • I want to save IP and port in separate lists, e.g : ' IP[168.8.172.2, 20.47.108.204, .........] PORT[80, 8888, ......] ' How I can achieve this ? – hina fraz Jun 21 '22 at 15:10
  • Just access parts of the `data` array, for example `ip = [row[0] for row in data]`. Adjust the index `0` to get the column you need. – leosh Jun 26 '22 at 12:42