HTTP headers are a possible issue, but not a likely one. A more probable cause is a proxy/firewall. I'll start by recapping the information I think is relevant from the comments;
- You are using a system, on which you do not have admin privileges.
- The system is configured to use a corporate proxy server.
http://pypi.org
works from your browser.
http://pypi.org
works from a PowerShell on your system.
http://pypi.org
hangs with your python code.
- Your system is running Windows. (probably irrelevant, but might be worth noting)
As both your browser as well as PowerShell seem to work fine, if you didn't change their settings, why are you trying to circumvent the proxy using python? (@vader asked this in comments, I didn't see a relevant response)
If circumventing the proxy is material to your goal, skip this section to the next (after the horizontal bar). If it isn't, as other programs seem to work fine, I suggest trying with the proxy using the system's original configuration;
- Remove the
session.trust_env = False
statement from the code.
- Test the code now. If it works, our job is done . Otherwise, keep reading.
- Revert all system changes you've made trying to make it work.
- Reboot your system.
I myself hate it when someone suggests that to me, but I found there are two good reasons to do that; the first is that there might be something stuck in the O/S and a reboot will release that, and the second is that I might not remember all the things I tinkered with to revert, and a reboot might do the job for me.
- Test again. Test the script, and with a browser, and with PowerShell (as per @yarin-007 's comment).
If the script still hangs on requests to pypi, further analysis is required. In order to narrow down the options, I suggest the following:
- Disable redirects by setting
allow_redirects=False
. While requests
should raise a TooManyRedirects
exception if there is a redirect loop, this would help identify a case where a redirect target is hanging. pypi should redirect http
to https
regardless of user-agent, or most other headers, which makes for a consistent, reliable request, limiting other possible factors.
- Set a request timeout. The type of exception raised on timeout expiration can help identify the cause.
The following code provides a good example. For your code, don't use the port numbers, the defaults should work. I added the port numbers explicitly, as each one demonstrates a different possible scenario:
#!/usr/bin/env python
import socket
import timeit
import requests
TIMEOUT = (4, 7) # ConnectT/O (per-IP), ReadT/O
def get_url(url, timeout=TIMEOUT):
try:
response = requests.get(url, timeout=timeout, allow_redirects=False)
print(f"Status code: {response.status_code}", end="")
if response.status_code in (301, 302):
print(f", Location: {response.headers.get('location')}", end="")
print(".")
except Exception as e:
print(f"Exception caught: {e!r}")
finally:
print(f"Fetching url '{url}' done", end="")
def time_url(url):
print(f"Trying url '{url}'")
total = timeit.timeit(f"get_url('{url}')", number=1, globals=globals())
print(f" in: {str(total)[:4]} seconds")
print("=============")
def print_expected_conntimeout(server):
r = socket.getaddrinfo(server, None, socket.AF_UNSPEC, socket.SOCK_STREAM)
print(f"IP addresses of {server}:\n" + "\n".join(addr[-1][0] for addr in r))
print(f"Got {len(r)} addresses, so expecting a a total ConnectTimeout of {len(r) * TIMEOUT[0]}")
def main():
scheme = "http://"
server = "pypi.org"
uri = f"{scheme}{server}:{{port}}".format
print_expected_conntimeout(server)
# OK/redirect (301)
time_url(uri(port=80))
# READ TIMEOUT after 7s
time_url(uri(port=8080))
# CONNECTION TIMEOUT after 4 * ip_addresses
time_url(uri(port=8082))
# REJECT
time_url('http://localhost:80')
if __name__ == "__main__":
main()
For me, this outputs:
$ ./testnet.py
IP addresses of pypi.org:
151.101.128.223
151.101.0.223
151.101.64.223
151.101.192.223
Got 4 addresses, so expecting a a total ConnectTimeout of 16
Trying url 'http://pypi.org:80'
Status code: 301, Location: https://pypi.org/.
Fetching url 'http://pypi.org:80' done in: 0.66 seconds
=============
Trying url 'http://pypi.org:8080'
Exception caught: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='pypi.org', port=8080): Read timed out. (read timeout=7)"))
Fetching url 'http://pypi.org:8080' done in: 7.21 seconds
=============
Trying url 'http://pypi.org:8082'
Exception caught: ConnectTimeout(MaxRetryError("HTTPConnectionPool(host='pypi.org', port=8082): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x103ec4730>, 'Connection to pypi.org timed out. (connect timeout=4)'))"))
Fetching url 'http://pypi.org:8082' done in: 16.0 seconds
=============
Trying url 'http://localhost:80'
Exception caught: ConnectionError(MaxRetryError("HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x103ec44c0>: Failed to establish a new connection: [Errno 61] Connection refused'))"))
Fetching url 'http://localhost:80' done in: 0.00 seconds
=============
Now to explain the four cases:
- A successful request to
http://pypi.org
returns a 301 redirect - to use https.
This is what you should get. If this is what you do get after adding allow_redirects=False
, then the prime suspect is the redirect chain, and I suggest similarly checking each location
header's value for every redirect response you receive, until you find the URL that hangs.
- Connection to port 8080 is successful (successful 3-way handshake), but the server does not return a proper response, and "hangs".
requests
raises a ReadTimeout
exception.
If your script raises this exception, it is likely that you are connecting to some sort of proxy which would not properly relay (or actively block) the request or the response. There might be some other system setting controlling this than trust_env
, or some appliance attached to the network's infrastructure.
- Connection to port 8082 is not successful; a 3-way handshake could not be established, and
requests
raises a ConnectTimeout
exception. Note that a connection would be attempted to each IP address found, so the timeout of 4 seconds would be multiplied by the amount of addresses, overall.
If this is what you see, it is likely that there is some firewall between your machine and pypi, which either prevents your SYN packets getting to their destination, or prevents the SYN+ACK packet getting back from the server to your machine.
- The fourth case is provided as an example, which I don't believe you'll encounter, but in case you do it is worth explaining.
In this case, the SYN packet either reached a server which does not listen on the desired port (which would be weird, possibly meaning you didn't really reach pypi), or that a firewall REJECTed your SYN packet (vs. simply DROPping it).
Another thing worth paying attention to, is pypi's IP addresses, as they are printed by the provided script. While IPv4 addresses are not guaranteed to keep their assignment, in this case if you find they are significantly different - that would suggest that you are not actually connecting to the real pypi servers, so the responses are unpredictable (including hangs). Following are pypi's IPv4 and IPv6 addresses:
pypi.org has address 151.101.0.223
pypi.org has address 151.101.64.223
pypi.org has address 151.101.128.223
pypi.org has address 151.101.192.223
pypi.org has IPv6 address 2a04:4e42::223
pypi.org has IPv6 address 2a04:4e42:200::223
pypi.org has IPv6 address 2a04:4e42:400::223
pypi.org has IPv6 address 2a04:4e42:600::223
Finally, as we've touched the different IP protocol versions, it is also possible that when initiating a connection, your system attempts to use a protocol which has a faulty route to the destination (e.g. trying IPv6, but one of the gateways mishandles that traffic). Usually a router would reply with an ICMP failure message, but I've seen cases where that doesn't happen (or isn't properly relayed back). I wasn't able to determine the root cause as the route was out of my control, but forcing a specific protocol solved that specific issue for me.
Hoping this provides some good debugging vectors, if this helps please add a comment, as I'm curious to what you find.