0

I would like to get an html page and read the content. I use requests (python) and my code is very simple:

import requests    
url = "http://www.romatoday.it" 
r = requests.get(url)
print r.text

when I try to do this procedure I get ever: Connection aborted.', error(110, 'Connection timed out') If I open the url in a browser all work well.

If I use requests with other url all is ok

I think is a "http://www.romatoday.it" particularity but I don't understand what is the problem. Can you help me please?

RoverDar
  • 441
  • 2
  • 12
  • 32
  • 1
    You have a typo in an url, comma instead of dot – Abdulafaja Sep 06 '16 at 10:00
  • Thanks @Abdulafaj. I don't know this kind of problem. Can you explain ? thaanks again – RoverDar Sep 06 '16 at 10:03
  • Thee problem isn't the comma (is a my edit mistake). The url without the comma doesn't work – RoverDar Sep 06 '16 at 10:11
  • Can you try traceroute or pathping (if you're on windows) to the URL? – Simon Hibbs Sep 06 '16 at 10:14
  • I've done "ping www.romatoday.it" and all works. – RoverDar Sep 06 '16 at 10:17
  • "tracert www.romatody.it " is ok too – RoverDar Sep 06 '16 at 10:20
  • It's also possible the web server is blocking requests based on the user agent header which identifies the client application. Here's how to spoof it http://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python – Simon Hibbs Sep 06 '16 at 10:20
  • Here's another directly relevant Q/A on this issue that might be of help. http://stackoverflow.com/questions/27422956/python-requests-library-sometimes-fails-to-open-site-that-a-browser-can-open – Simon Hibbs Sep 06 '16 at 10:24
  • How many times have you been hammering the server? – Padraic Cunningham Sep 06 '16 at 10:25
  • I think I have get the url 15, 20 times. And every time I get Connection aborted.', error(110, 'Connection timed out') – RoverDar Sep 06 '16 at 10:30
  • Are you sleeping between requests? Also are you using a session or creating a new connection for each request? – Padraic Cunningham Sep 06 '16 at 10:30
  • "Are you sleeping between requests? "I don't understand sorry. – RoverDar Sep 06 '16 at 10:31
  • Ok lets digress, what version of requests are you using? – Padraic Cunningham Sep 06 '16 at 10:37
  • request versione is 2.7.0 – RoverDar Sep 06 '16 at 10:42
  • Now I have upgrade requests (2.11.1) but I have the problem again. – RoverDar Sep 06 '16 at 10:49
  • Have you turned on debugging? – Padraic Cunningham Sep 06 '16 at 10:58
  • Yes I'm in Django enviroment. With the new requests version I get: HTTPConnectionPool(host='www.romatoday.it', port=80): Max retries exceeded with url: /eventi/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 110] Connection timed out',)) – RoverDar Sep 06 '16 at 11:01
  • I create a new connection for each request – RoverDar Sep 06 '16 at 11:03
  • Could be a header setting problem?This is the headers in Chrome Server: BlackStone Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: close Vary: Accept-Encoding X-Powered-By: DYNAMIC+ BlackStone (build: 40626; date: Sat, 06 Aug 2016 15:14:02 +0200; server: cn03-www1) Vary: Cookie ETag: W/"jTgH1uatCeiJCmWovJqQU5" Date: Tue, 06 Sep 2016 10:22:03 GMT Expires: Tue, 06 Sep 2016 10:40:37 GMT Cache-Control: public, max-age=1114, post-check=1114, pre-check=1114 X-XSS-Protection: 1 Content-Encoding: gzip Set-Cookie: __bs=cn03-www1|V86Yz|V86Yz; path=/; HttpOnly – RoverDar Sep 06 '16 at 11:05

2 Answers2

0

Maybe the problem is that the comma here

>> url = "http://www.romatoday,it" 

should be a dot

>> url = "http://www.romatoday.it"

I tried that and it worked for me

Pani
  • 1,317
  • 1
  • 14
  • 20
  • 1
    Sorry is a my mistake. I have edit it. The url (without the comma) doesn't work again. Could be a requests module version problem? – RoverDar Sep 06 '16 at 10:07
-1

Hmm..Have you tried other packages, not 'requests'? the code blow is same result as your code.

import urllib

url = "http://www.romatoday.it" 
r = urllib.urlopen(url)
print r.read()

a picture that I captured after running your code.

Junsuk Park
  • 193
  • 1
  • 13