Why do I get a "Connection aborted" error when trying to crawl a specific website?

Question

I wrote a Web crawler in Python 2.7, but a specific site cannot be downloaded although it can be viewed in browser.

My code is as following:

# -*- coding: utf-8 -*-

import requests

# OK
url = 'http://blog.ithome.com.tw/'
url = 'http://7club.ithome.com.tw/'
url = 'https://member.ithome.com.tw/'
url = 'http://ithome.com.tw/'
url = 'http://weekly.ithome.com.tw'

# NOT OK
url = 'http://download.ithome.com.tw'
url = 'http://apphome.ithome.com.tw/'
url = 'http://ithelp.ithome.com.tw/'

try:
    response = requests.get(url)
    print 'OK!'
    print 'response.status_code: %s' %(response.status_code)

except Exception, e:
    print 'NOT OK!'
    print 'Error: %s' %(e)
print 'DONE!'
print 'response.status_code: %s' %(response.status_code)

Each time I have tried I get this error:

C:\Python27\python.exe "E:/python crawler/test_ConnectionFailed.py"
NOT OK!
Error: ('Connection aborted.', BadStatusLine("''",))
DONE!
Traceback (most recent call last):
  File "E:/python crawler/test_ConnectionFailed.py", line 29, in <module>
    print 'response.status_code: %s' %(response.status_code)
NameError: name 'response' is not defined

Process finished with exit code 1

Why is this happening and how can I fix it?

SOLVED! I just use another proxy software, then OK!

Possible duplicate of [Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error](http://stackoverflow.com/questions/33174804/python-requests-getting-connection-aborted-badstatusline-error) — M4rtini, Jan 19 '16 at 09:13
@MarcoFerrari Good edit, but where do these comments in code come from? — Remi Guan, Jan 19 '16 at 09:23
@M4rtini, thanks for editing, but my problem is not solved by the answer in the question. — oner ptkh, Jan 20 '16 at 02:25

score 1 · Accepted Answer · edited Sep 07 '22 at 18:50

1

I found that using urllib2 library better than request.

import urllib2
def get_page(url):
  request = urllib2.Request(url)
  request = urllib2.urlopen(request)
  data = request.read()
  return data
url = "http://blog.ithome.com.tw/"
print get_page(url)

edited Sep 07 '22 at 18:50

TylerH

20,799
66
75
101

answered Jan 19 '16 at 09:28

Hans

161
1
11

Thanks for answering! But I test it with "http://ithelp.ithome.com.tw/" with a similar error:`httplib.BadStatusLine: ''` – oner ptkh Jan 19 '16 at 09:34

score 0 · Answer 2 · edited Sep 07 '22 at 18:50

0

The connection could not be resolved for those domains, doing a normal ping operation on the urls yield this result

Command to run:

ping http://download.ithome.com.tw

Result

The host could not be resolved

No response and hence no status line which in normal cases would contain a status code.

edited Sep 07 '22 at 18:50

TylerH

20,799
66
75
101

answered Jan 19 '16 at 09:13

cafebabe1991

4,928
2
34
42

Thanks for answering! But I test to ping with "http://ithome.com.tw" (it is accessible from my python crawler) with the same error. – oner ptkh Jan 19 '16 at 09:36
The url that you have mentioned in not ok section says : http://weekly.ithome.com.tw but now you have mentioned http://ithome.com.tw, The Latter opens up but former doesn't – cafebabe1991 Jan 19 '16 at 09:38

Why do I get a "Connection aborted" error when trying to crawl a specific website?

2 Answers2