36

This is the script:

import requests
import json
import urlparse
from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=1))

with open('proxies.txt') as proxies:
    for line in proxies:
        proxy=json.loads(line)

    with open('urls.txt') as urls:
        for line in urls:

            url=line.rstrip()
            data=requests.get(url, proxies=proxy)
            data1=data.content
            print data1
            print {'http': line}

as you can see, its trying to access a list of urls through a list of proxies. Here is the urls.txt file:

http://api.exip.org/?call=ip

here is the proxies.txt file:

{"http":"http://107.17.92.18:8080"}

I got this proxy at www.hidemyass.com. Could it be a bad proxy? I have tried several and this is the result. Note: if you are trying to replicate this, you may have to update the proxy to a recent one at hidemyass.com. They seem to stop working eventually. here is the full error and traceback:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    data=requests.get(url, proxies=proxy)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects
    allow_redirects=False,
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')
Athena
  • 3,200
  • 3
  • 27
  • 35
BigBoy1337
  • 4,735
  • 16
  • 70
  • 138
  • Is the indentation in your example correct? – Lukasa Aug 28 '13 at 10:26
  • Because the bodies of your `for` loops aren't indented. That seems like it'd raise an IndentationError to me. – Lukasa Sep 03 '13 at 08:01
  • oh shoot, your right. I copied the code wrong. The question still stands though. – BigBoy1337 Sep 03 '13 at 20:14
  • 2
    Your loops are still not right. The current code will only ever use the last proxy listed in proxies.txt. – brechin Sep 13 '13 at 12:13
  • I have the exactly error with my ISPs proxy. I've seen the issue only with one specific url (POST request). I can do the requests by disabling the proxy: `proxies={'https':None}` (using https). – Toni Aug 23 '14 at 12:20

5 Answers5

37

Looking at stack trace you've provided your error is caused by httplib.BadStatusLine exception, which, according to docs, is:

Raised if a server responds with a HTTP status code that we don’t understand.

In other words something that is returned (if returned at all) by proxy server cannot be parsed by httplib that does actual request.

From my experience with (writing) http proxies I can say that some implementations may not follow specs too strictly (rfc specs on http aren't easy reading actually) or use hacks to fix old browsers that have flaws in their implementation.

So, answering this:

Could it be a bad proxy?

... I'd say - that this is possible. The only real way to be sure is to see what is returned by proxy server.

Try to debug it with debugger or grab packet sniffer (something like Wireshark or Network Monitor) to analyze what happens in the network. Having info about what exactly is returned by proxy server should give you a key to solve this issue.

Eugene Loy
  • 12,224
  • 8
  • 53
  • 79
  • I am not using httplib, unless it is included in the request library? Aside from this, are you saying that the my request goes through the proxy server to the webpage, back to the proxy server, and then what that proxy server tries to relay to me is unreadable? – BigBoy1337 Sep 10 '13 at 19:53
  • @BigBoy1337 about httplib: seems like it is used indirectly (in any case you do have exception from it in your traceback). About "are you saying that the my request goes through the proxy server to the webpage, back to the proxy server, and then what that proxy server tries to relay to me is unreadable?": it is possible but not necessary. All we know at the moment is that reply from proxy is not valid. ... – Eugene Loy Sep 10 '13 at 20:03
  • ... It is possible that proxy encountered some internal error even before delivering request to final web server and thus proxy replied with invalid reply. It is also possible that this internal error happened in proxy after web server replied with valid reply to the proxy. And on top of that it is possible that web server replied to the proxy with invalid rely which in turn resulted in invalid reply from proxy. As I've said in my answer - the most straightforward way to figure out what is the root cause is to fetch more data about what was actually replied from proxy. – Eugene Loy Sep 10 '13 at 20:09
  • Doesn't "max retries exceeded with url" provide any possible clue? What could that mean? It sounds like the proxy tried to give a request to the web server but it kept telling it to try again (cause of some error). – BigBoy1337 Sep 10 '13 at 21:57
  • @BigBoy1337 not really. Your trackback can be interpreted as: failed to perform request ("Max retries exceeded with url [...]" part of the error message) **as a result of `httplib.BadStatusLine`** exception raised somewhere before ("Caused by [...]" part of the error message). Note that there is no info about the reason why `httplib.BadStatusLine` was raised (except documentation). – Eugene Loy Sep 11 '13 at 06:09
  • I know that the proxy is a bad proxy in my case, I want to try change it but I have no idea where it is set in my Cent OS server configuration. My server needs a proxy to connect to internet, but this `HTTPConnectionPool(..)` is a wrong one. Anyone with solutions for me and those in future? – krozaine Jul 22 '16 at 21:57
9

Maybe you are overloading the proxy server by sending too much requests in a short period of time, you say that you got the proxy from a popular free proxy website which means that you're not the only one using that server and it's often under heavy load.

If you add some delay between your requests like this :

from time import sleep

[...]

data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
sleep(1)

(note the sleep(1) which pauses the execution of the code for one second)

Does it work ?

  • Request always adds the original reason for teh exception at the end. requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by : '') In this example. (Caused by : '') That means it wouldn't be a problem with Timing more unhandled HTTP as par the answer provided. – PsyKzz Sep 12 '13 at 22:19
  • @MattPsyK I've had this "BadStatusLine" exception many times with a popular website (running standard Apache) when I sent too much requests at the same time (the sleep() trick worked for me), so maybe it's the same issue here... –  Sep 12 '13 at 23:04
  • @BigBoy1337. Try to increase the sleep time or limit the number of files you're requesting from the server. – CKM Mar 13 '17 at 05:27
3
def hello(self):
    self.s = requests.Session()
    self.s.headers.update({'User-Agent': self.user_agent})
    return True

Try this,It worked for me :)

Ashu
  • 347
  • 1
  • 9
1

This happens when you send too many requests to the public IP address of https://anydomainname.example.com/. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://anydomainname.example.com/. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.

import re
import socket
import subprocess
from typing import Tuple

ENDPOINT = 'https://anydomainname.example.com/'

def get_public_ip() -> Tuple[str, str, str]:
    """
    Command to get public_ip address of host machine and endpoint domain
    Returns
    -------
    my_public_ip : str
        Ip address string of host machine.
    end_point_ip_address : str
        Ip address of endpoint domain host.
    end_point_domain : str
        domain name of endpoint.

    """
    # bash_command = """host myip.opendns.com resolver1.opendns.com | \
    #     grep "myip.opendns.com has" | awk '{print $4}'"""
    # bash_command = """curl ifconfig.co"""
    # bash_command = """curl ifconfig.me"""
    bash_command = """ curl icanhazip.com"""
    my_public_ip = subprocess.getoutput(bash_command)
    my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
    end_point_domain = (
        ENDPOINT.replace("https://", "")
        .replace("http://", "")
        .replace("/", "")
    )
    end_point_ip_address = socket.gethostbyname(end_point_domain)
    return my_public_ip, end_point_ip_address, end_point_domain


def set_etc_host(ip_address: str, domain: str) -> str:
    """
    A function to write mapping of ip_address and domain name in /etc/hosts.
    Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build

    Parameters
    ----------
    ip_address : str
        IP address of the domain.
    domain : str
        domain name of endpoint.

    Returns
    -------
    str
        Message to identify success or failure of the operation.

    """
    bash_command = """echo "{}    {}" >> /etc/hosts""".format(ip_address, domain)
    output = subprocess.getoutput(bash_command)
    return output


if __name__ == "__main__":
    my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
    output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
    print("My public IP address:", my_public_ip)
    print("ENDPOINT public IP address:", end_point_ip_address)
    print("ENDPOINT Domain Name:", end_point_domain )
    print("Command output:", output)

You can call the above script before running your desired function :)

Vaibhav Hiwase
  • 411
  • 1
  • 3
  • 7
1

This happens when you overload the server with multiple requests. In order to bypass this you can increase the time between each request. But the best thing in my case was to increase the retry times in each request

requests.adapters.DEFAULT_RETRIES = 5 # increase retries number
requests.get(url)

If this is still not helpful you can find more ways here.

dkaradima
  • 71
  • 3