0

Comrades, good afternoon. This problem has not been solved for me for some time. I tried many options, with my friends we tested the problem and wrote about it to the developers of the library. But no solution has been found. This is a request library in Python. In the case when a request is sent using a proxy server (anonymous) to sites where you can check your external IP, my IP is returned. I really consciously approached the question and combined some of the knowledge that I have acquired here on StackOverFlow

My code specifically contains a function to check my external IP:

def check_my_ip(
        header ={},
        use_proxy: bool = False,
        proxy_dict={}):
    my_ip: str =""
    message = []
    flag = False
    try:
        my_ip = requests.get(url = 'https://ifconfig.me/', headers=header, proxies=proxy_dict, verify=False)
        my_ip = my_ip.text
        if len(my_ip) > 15: my_ip=""
    except:
        my_ip = ""
    if my_ip =="":
        try:
            my_ip = requests.get('https://ramziv.com/ip', headers=header,proxies=proxy_dict, verify=False).text
            if len(my_ip) > 15: my_ip=""
        except:
            my_ip = ""
    if my_ip == "":
        try:
            s = requests.get('https://2ip.ua/ru/', headers=header, proxies=proxy_dict, verify=False)
            b = BeautifulSoup(s.text, "html.parser")
            b = b.select(" .ipblockgradient .ip")[0].getText()
            my_ip = re.search(r"(\d{1,3}\.){1,3}\d{1,3}", b)
            if len(my_ip) > 15: my_ip=""
        except:
            my_ip=""
    if my_ip!="":
        print("Received IP: " + my_ip)
        flag = True
    else:
        print("Failed to get IP")

    return {'flag': flag, 'result': my_ip, 'message': message}

This function can accept a proxy if it is sent to it.

Also I have a function to get the fake agent:

def take_header():
    headers: dict = {}
    message = []
    flag = False
    try:
        user = fake_useragent.UserAgent().random
        headers = {'User-Agent': user}
        print("Fake agent received!\n" + str(headers))
    except:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
        print("Header getting error!\n" + str(headers))
    return {'flag': flag, 'result': headers, 'message': message}

I'm getting a proxy from the famous site https://www.us-proxy.org/ using a table processing function I wrote. This function allows you to filter the proposed proxies and I take only those that are anonymous.

def take_proxy(url: str = "",
               headers:dict = {},
               proxies: dict = {},
               take_http: bool = False,
               take_https: bool = False):
    proxy_dict: dict = {}
    message = []
    flag = False

    try:
        try:
            res = requests.get(url=url, headers=headers, proxies = proxies)
            print("Get table from page " + str(url))
        except Exception as exc:
            print("Error getting table from page" + str(url))
            print("Error text:" + str(exc))
            res = 0
        soup = BeautifulSoup(res.text, "lxml")
        table_proxy_list = soup.find('table', class_="table table-striped table-bordered")
        proxy_list = []
        for row in table_proxy_list.tbody.find_all('tr'):
            columns = row.find_all('td')
            temp_proxy_items = proxy_items_us_proxy_org()
            if (columns != []):
                temp_proxy_items.IP_Address = columns[0].text.strip()
                temp_proxy_items.Port = int(columns[1].text.strip())
                temp_proxy_items.Code = columns[2].text.strip()
                temp_proxy_items.Country = columns[3].text.strip()
                temp_proxy_items.Anonymity = True if \
                    (columns[4].text.strip() == "anonymous" or \
                    columns[4].text.strip() == "elite proxy") else False
                temp_proxy_items.Google = True if columns[5].text.strip() == "yes" else False
                temp_proxy_items.Https = True if columns[6].text.strip() == "yes" else False
                temp_proxy_items.Last_Checked = columns[7].text.strip()
                proxy_list.append(temp_proxy_items)
                columns = None
        table_head = [str(table_head_item.text).replace(" ", "_") for table_head_item in
                      table_proxy_list.thead.find_all('th')]
        df_proxy_list = pd.DataFrame.from_records([t.__dict__ for t in proxy_list], columns=table_head)
        df_proxy_list['HTTP_S'] = np.where(df_proxy_list['Https'] == True, "https", "http")
        df_proxy_list['IP_PORT'] = df_proxy_list.agg('{0[HTTP_S]}://{0[IP_Address]}:{0[Port]}'.format, axis=1)
        df_proxy_list_http = df_proxy_list.query('Https==False & Anonymity==True')
        df_proxy_list_https = df_proxy_list.query('Https==True & Anonymity==True')
        df_proxy_list_http = df_proxy_list_http[['IP_PORT']]
        df_proxy_list_https = df_proxy_list_https[['IP_PORT']]
        df_proxy_list_http = df_proxy_list_http['IP_PORT'].to_list()
        df_proxy_list_https = df_proxy_list_https['IP_PORT'].to_list()
        proxy_dict={}
        if take_http == True:
            proxy_dict["http"] = df_proxy_list_http
        if take_https == True:
            proxy_dict["https"] = df_proxy_list_https
        print("Proxy list received: \n" + str(proxy_dict))
        flag = True       
    except Exception as exc:
        print("The proxy list is empty, because error in proxy table processing")
        print("Error text:" + str(exc))        
    return {'flag': flag, 'result': proxy_dict, 'message': message}

I combine all these functions in this order:

  1. I define my IP;
  2. I run a function to process a site with a proxy and get a list of them filtered by the condition of anonymity.
  3. I check all proxies, send them back to the function to make sure that these sites will show proxy addresses as a response.
if __name__ == '__main__':
    my_ip = check_my_ip()
    print("My IP: " + my_ip['result'])
    header = take_header()['result']
    proxies= take_proxy(url="https://www.us-proxy.org/", headers=header, take_http=True, take_https=True)
    for item in proxies['result']:
        proxy_items = proxies['result'][item]
        proxy_dict_to_send = {}
        for proxy_items_i in proxy_items:
            proxy_dict_to_send["http"] = proxy_items_i
            print("Used by proxy: "+ str(proxy_dict_to_send))
            result_check = check_my_ip(header = header, use_proxy = True, proxy_dict = proxy_dict_to_send)
            print (result_check)
            proxy_dict_to_send = {}

I prepared this long code because when I asked a question on individual points, I received answers that are already on the network. Please spend time on my code, this is indeed a problem that shows up in the library and needs some kind of solution.

I posted here the full version of my code

And here I am attaching the file with the code for your convenience.

Please pay attention to the problem, help to understand. Maybe I'm wrong (and three of my comrades), or maybe a real problem has been identified.

Maybe you can suggest some temporary other solutions? It may be possible to implement similar code using other libraries or in general in other ways.

I need to receive information anonymously, substituting another, proxy, as the address. Maybe there is some other library for sending requests to sites. Any options please.

FULL MINIMUM Version of code:

import fake_useragent
from bs4 import BeautifulSoup, element
import requests
import pandas as pd
import numpy as np
import re
import http.client
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

class proxy_items_us_proxy_org:
    def __init__(self, IP_Address: str = "",
                 Port: int = 0,
                 Code: str = "",
                 Country: str = "",
                 Anonymity:  bool = False,
                 Google: bool = False,
                 Https: bool = False,
                 Last_Checked: str = ""):
        self.IP_Address = IP_Address,
        self.Port = Port,
        self.Code = Code,
        self.Country = Country,
        self.Anonymity = Anonymity,
        self.Google = Google,
        self.Https = Https,
        self.Last_Checked = Last_Checked

    def __str__(self):
        return self.IP_Address + ":" \
               + str(self.Port) +"; " \
                + self.Code + "; " \
                + self.Country +"; " \
                + str(self.Anonymity) + "; " \
                + str(self.Google) + "; " \
               + str(self.Https) + "; " \
               + self.Last_Checked

    def __repr__(self):
        return self.__str__()

    def to_list(self):
        return [
        self.IP_Address,
        self.Port,
        self.Code,
        self.Country,
        self.Anonymity,
        self.Google,
        self.Https,
        self.Last_Checked
        ]

def check_my_ip(
        header =None,
        use_proxy: bool = False,
        proxy_dict=None):
    my_ip: str =""
    message = []
    flag = False
    try:
        my_ip = requests.get(url = 'https://ifconfig.me/', headers=header, proxies=proxy_dict, verify=False)
        my_ip = my_ip.text
        if len(my_ip) > 15: my_ip=""
    except:
        my_ip = ""
    if my_ip =="":
        try:
            my_ip = requests.get('https://ramziv.com/ip', headers=header,proxies=proxy_dict, verify=False).text
            if len(my_ip) > 15: my_ip=""
        except:
            my_ip = ""
    if my_ip == "":
        try:
            s = requests.get('https://2ip.ua/ru/', headers=header, proxies=proxy_dict, verify=False)
            b = BeautifulSoup(s.text, "html.parser")
            b = b.select(" .ipblockgradient .ip")[0].getText()
            my_ip = re.search(r"(\d{1,3}\.){1,3}\d{1,3}", b)
            if len(my_ip) > 15: my_ip=""
        except:
            my_ip=""
    if my_ip!="":
        print("Received IP: " + my_ip)
        flag = True
    else:
        print("Failed to get IP")

    return {'flag': flag, 'result': my_ip, 'message': message}


def take_proxy(url: str = "",
               headers:dict = None,
               proxies: dict = None,
               take_http: bool = False,
               take_https: bool = False):
    proxy_dict: dict = {}
    message = []
    flag = False

    try:
        try:
            res = requests.get(url=url, headers=headers, proxies = proxies)
            print("Get table from page " + str(url))
        except Exception as exc:
            print("Error getting table from page" + str(url))
            print("Error text:" + str(exc))
            res = 0
        soup = BeautifulSoup(res.text, "lxml")
        table_proxy_list = soup.find('table', class_="table table-striped table-bordered")
        proxy_list = []
        for row in table_proxy_list.tbody.find_all('tr'):
            columns = row.find_all('td')
            temp_proxy_items = proxy_items_us_proxy_org()
            if (columns != []):
                temp_proxy_items.IP_Address = columns[0].text.strip()
                temp_proxy_items.Port = int(columns[1].text.strip())
                temp_proxy_items.Code = columns[2].text.strip()
                temp_proxy_items.Country = columns[3].text.strip()
                temp_proxy_items.Anonymity = True if \
                    (columns[4].text.strip() == "anonymous" or \
                    columns[4].text.strip() == "elite proxy") else False
                temp_proxy_items.Google = True if columns[5].text.strip() == "yes" else False
                temp_proxy_items.Https = True if columns[6].text.strip() == "yes" else False
                temp_proxy_items.Last_Checked = columns[7].text.strip()
                proxy_list.append(temp_proxy_items)
                columns = None
        table_head = [str(table_head_item.text).replace(" ", "_") for table_head_item in
                      table_proxy_list.thead.find_all('th')]
        df_proxy_list = pd.DataFrame.from_records([t.__dict__ for t in proxy_list], columns=table_head)
        df_proxy_list['HTTP_S'] = np.where(df_proxy_list['Https'] == True, "https", "http")
        df_proxy_list['IP_PORT'] = df_proxy_list.agg('{0[IP_Address]}:{0[Port]}'.format, axis=1)
        df_proxy_list_http = df_proxy_list.query('Https==False & Anonymity==True')
        df_proxy_list_https = df_proxy_list.query('Https==True & Anonymity==True')
        df_proxy_list_http = df_proxy_list_http[['IP_PORT']]
        df_proxy_list_https = df_proxy_list_https[['IP_PORT']]
        df_proxy_list_http = df_proxy_list_http['IP_PORT'].to_list()
        df_proxy_list_https = df_proxy_list_https['IP_PORT'].to_list()
        proxy_dict={}
        if take_http == True:
            proxy_dict["http"] = df_proxy_list_http
        if take_https == True:
            proxy_dict["https"] = df_proxy_list_https
        print("Proxy list received: \n" + str(proxy_dict))
        flag = True       
    except Exception as exc:
        print("The proxy list is empty, because error in proxy table processing")
        print("Error text:" + str(exc))        
    return {'flag': flag, 'result': proxy_dict, 'message': message}


if __name__ == '__main__':
    my_ip = check_my_ip()
    print("My IP: " + my_ip['result'])
    try:
        user = fake_useragent.UserAgent().random
        headers = {'User-Agent': user}
    except:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    proxies= take_proxy(url="https://www.us-proxy.org/", headers=headers, take_http=True, take_https=True)
    for item in proxies['result']:
        proxy_items = proxies['result'][item]
        proxy_dict_to_send = {}
        for proxy_items_i in proxy_items:
            proxy_dict_to_send["http"] = proxy_items_i
            proxy_dict_to_send["https"] = proxy_items_i
            print("Used by proxy: "+ str(proxy_dict_to_send))
            result_check = check_my_ip(header = headers, use_proxy = True, proxy_dict = proxy_dict_to_send)
            print (result_check)
            proxy_dict_to_send = {}


  • There's a lot of code here, and most of it doesn't seem relevant to the question. If you want people to help, please post a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): that means the smallest amount of code that reproduces the problem you're asking about, ideally presented so that we can simply copy it and run it locally to see the behavior ourselves. – larsks Aug 28 '22 at 21:06
  • But looking at your code, I wonder if you're running into [this problem](https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python). – larsks Aug 28 '22 at 21:07
  • @larsks, Thanks for the link. I really didn't know about it. Also thanks for your attention! I've shortened the code a bit, but I assure you it's minimal to exactly reflect the problem. You can take the FULL MINIMUM Version of code below. It just pops in and reflects the problem. The code is so long to avoid issues such as: your proxy is not working, you are not using the header, sites with IP verification do not work correctly. I've just tried all these options. I also leave a function for selecting proxies to iterate through them all and select a boring list by filter. –  Aug 28 '22 at 21:36
  • A minimal example would really only include your `check_my_ip` method and a static list of working proxies. All that business about parsing the remote proxies list with BeautifulSoup isn't relevant. – larsks Aug 28 '22 at 22:38

1 Answers1

1

Your question is about the behavior of your check_my_ip method. Everything else is extraneous and doesn't need to be included in your question: we're not trying to debug the behavior of BeautifulSoup. For a minimal example, pick some proxies that work and then use them to test the check_my_ip method.

That might look something like this (note that I've replaced you various "tell me my ip" websites with icanhazip.com, which returns the ip address as plain text, again greatly simplifying the code):

import requests
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


def take_proxy(**kwargs):

    proxy_dict = {
        "http": [
            "45.79.158.235:1080",
            "47.88.6.186:30001",
            "47.251.12.73:3128",
            "198.59.191.234:8080",
            "66.175.223.147:4153",
            "194.195.213.197:1080",
            "194.195.216.153:4145",
            "47.88.8.118:30001",
        ]
    }

    return {"flag": True, "result": proxy_dict, "message": ""}


def check_my_ip(header=None, proxy_dict=None):
    my_ip = None
    message = None
    flag = False
    try:
        res = requests.get(
            url="https://icanhazip.com/",
            headers=header,
            proxies=proxy_dict,
            verify=False,
        )
        my_ip = res.text
        flag = True
    except Exception as err:
        message = str(err)

    return {"flag": flag, "result": my_ip, "message": message}


if __name__ == "__main__":
    my_ip = check_my_ip()
    print("My IP: " + my_ip["result"])
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
    }
    proxies = take_proxy(
        url="https://www.us-proxy.org/",
        headers=headers,
        take_http=True,
        take_https=True,
    )
    for item in proxies["result"]:
        proxy_items = proxies["result"][item]
        for proxy_items_i in proxy_items:
            proxy_dict_to_send = {"http": proxy_items_i}
            print("Used by proxy: " + str(proxy_dict_to_send))
            result_check = check_my_ip(header=headers, proxy_dict=proxy_dict_to_send)
            print(result_check)

We've gone from 165 lines to only 62 lines and drastically simplified the logic so that it's easier for people to follow. Running this, we see exactly the behavior you described:

My IP: 1.2.3.4

Used by proxy: {'http': '45.79.158.235:1080'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '47.88.6.186:30001'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '47.251.12.73:3128'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '198.59.191.234:8080'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '66.175.223.147:4153'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '194.195.213.197:1080'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '194.195.216.153:4145'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}
Used by proxy: {'http': '47.88.8.118:30001'}
{'flag': True, 'result': '1.2.3.4\n', 'message': None}

What's going on here? Well, note that in check_my_ip (both in my version and in your original version), the websites you're using to determine your ip address are all https:// websites, but you're only provide an http entry in your proxy_dict:

proxy_dict_to_send["http"] = proxy_items_i

That means requests only knows about a proxy for http:// URLs, and will use no proxy for https:// URLs. If we modify the code so that it looks like this instead:

for proxy_items_i in proxy_items:
    # specify proxies for both http:// and https:// urls
    proxy_dict_to_send = {"http": proxy_items_i, "https": proxy_items_i}
    print("Used by proxy: " + str(proxy_dict_to_send))
    result_check = check_my_ip(header=headers, proxy_dict=proxy_dict_to_send)
    print(result_check)

And then run it again, we see the results we want!

My IP: 1.2.3.4

Used by proxy: {'http': '45.79.158.235:1080', 'https': '45.79.158.235:1080'}
{'flag': True, 'result': '45.79.158.235\n', 'message': None}
Used by proxy: {'http': '47.88.6.186:30001', 'https': '47.88.6.186:30001'}
{'flag': True, 'result': '47.88.6.186\n', 'message': None}
Used by proxy: {'http': '47.251.12.73:3128', 'https': '47.251.12.73:3128'}
{'flag': True, 'result': '47.251.12.73\n', 'message': None}
Used by proxy: {'http': '198.59.191.234:8080', 'https': '198.59.191.234:8080'}
{'flag': True, 'result': '198.59.191.249\n', 'message': None}
Used by proxy: {'http': '66.175.223.147:4153', 'https': '66.175.223.147:4153'}
{'flag': True, 'result': '66.175.223.147\n', 'message': None}
Used by proxy: {'http': '194.195.213.197:1080', 'https': '194.195.213.197:1080'}
{'flag': True, 'result': '194.195.213.197\n', 'message': None}
Used by proxy: {'http': '194.195.216.153:4145', 'https': '194.195.216.153:4145'}
{'flag': True, 'result': '194.195.216.153\n', 'message': None}
Used by proxy: {'http': '47.88.8.118:30001', 'https': '47.88.8.118:30001'}
{'flag': True, 'result': '47.88.8.118\n', 'message': None}
larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thanks for your reply. In fact, your version of the code works. In order for me to implement the same thing as yours, I've made all the changes to my FULL MINIMUM Version of code. In particular: in the def take_proxy function, I changed the line with the type of proxy received: it was like this: ```#df_proxy_list['IP_PORT'] = df_proxy_list.agg('{0[HTTP_S]}://{0[IP_Address]}:{0[ Port]}'.format, axis=1)``` became like this: ```df_proxy_list['IP_PORT'] = df_proxy_list.agg('{0[IP_Address]}:{0[Port]}'.format, axis=1)``` –  Aug 29 '22 at 07:12
  • Also I changed in main part: ```proxy_dict_to_send["http"] = proxy_items_i proxy_dict_to_send["https"] = proxy_items_i```. Now everything works just like yours. This is the output I get: ```Used by proxy: {'http': '198.49.68.80:80', 'https': '198.49.68.80:80'} {'flag': False, 'result': '', 'message': [{'message': 'Failed to get IP', 'color_name': '#E01B22', 'text_schema': '\x1b[31m' , 'mes_type': 'info'}]}```. –  Aug 29 '22 at 07:13
  • Those. for some reason, the proxies that I get from the site https://www.us-proxy.org/ don't work that way for me. Moreover, 5-6 requests pass, and the application stops, as if there is a long request and no response is received. Those. my question is, in effect, complex. What's wrong with the proxies I'm getting? Where can I get temporary free proxies? Or something else is missing for the application to work? –  Aug 29 '22 at 07:17
  • And I cannot understand what is the difference between what you do and what I do. Please do not put a low rating, help figure it out! –  Aug 29 '22 at 07:27
  • The difference is that you need to set both the `http` and `https` keys in `proxy_dict`, and as you can see from this example, it works just fine using proxies from `us-proxy.org`; that's exactly where those proxy address in my example come from. – larsks Aug 29 '22 at 11:13