How to handle IncompleteRead: in python

Question

I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some research online and found that this might be due to a server error (A chunked transfer encoding finishing before reaching the expected size). I also found a workaround for above on this link

However, I am not sure as to how to use this for my case. Following is the code I am working on

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)')]
urls = "http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brands"
page = urllib2.urlopen(urls).read()
soup = BeautifulSoup(page)
links = soup.findAll('img',url=True)

for tag in links:
    name = tag['alt']
    tag['url'] = urlparse.urljoin(urls, tag['url'])
    r = br.open(tag['url'])
    page_child = br.response().read()
    soup_child = BeautifulSoup(page_child)
    contracts = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "tariff-duration"})]
    data_usage = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "allowance"})]
    print contracts
    print data_usage

Please help me with this.Thanks

Usually, after I get the error I try another request and it has always succeeded. Maybe 100 times out of 100 trials. — Blaszard, Sep 08 '16 at 06:01

score 28 · Accepted Answer · edited Feb 24 '18 at 07:32

28

The link you included in your question is simply a wrapper that executes urllib's read() function, which catches any incomplete read exceptions for you. If you don't want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example:

try:
    page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
    page = e.partial

for python3

try:
    page = request.urlopen(urls).read()
except (http.client.IncompleteRead) as e:
    page = e.partial

edited Feb 24 '18 at 07:32

user9404228

3
2

answered Jan 21 '13 at 15:53

Kyle

4,202
1
33
41

4

this is not a solution, it just turning around it with try catch – Simone Jan 04 '20 at 16:42

Sérgio · Answer 2 · 2021-03-17T15:06:59.477

10

Note this answer is Python 2 only (it was published in 2013)

I find out in my case : send HTTP/1.0 request , adding this , fix the problem.

import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

after I do the request :

req = urllib2.Request(url, post, headers)
filedescriptor = urllib2.urlopen(req)
img = filedescriptor.read()

after I back to http 1.1 with (for connections that support 1.1) :

httplib.HTTPConnection._http_vsn = 11
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'

the trick is use http 1.0 instead the default http/1.1 http 1.1 could handle chunks but for some reason webserver don't , so we do the request in http 1.0

for Python3, it will tell you

ModuleNotFoundError: No module named 'httplib'

then try to use http.client Module it will solve the problem

import http.client as http
http.HTTPConnection._http_vsn = 10
http.HTTPConnection._http_vsn_str = 'HTTP/1.0'

edited Mar 17 '21 at 15:06

answered Dec 17 '13 at 22:13

Sérgio

6,966
1
48
53

1

@SSérgio , having same issue while using `urllib2.urlopen(url).read()`, But above code solved this. Can you please explain this ? – Nishant Nawarkhede Feb 19 '14 at 15:19
1

webserver don't handle chunks correctly , because is old or is microsoft , and or it have an slow connection ... – Sérgio Jan 05 '20 at 02:17
Works for me as well, probably the server is old, thanks @Sérgio – dbouz Apr 12 '22 at 16:01

score 7 · Answer 3 · answered Aug 09 '14 at 01:29

What worked for me is catching IncompleteRead as an exception and harvesting the data you managed to read in each iteration by putting this into a loop like below: (Note, I am using Python 3.4.1 and the urllib library has changed between 2.7 and 3.4)

try:
    requestObj = urllib.request.urlopen(url, data)
    responseJSON=""
    while True:
        try:
            responseJSONpart = requestObj.read()
        except http.client.IncompleteRead as icread:
            responseJSON = responseJSON + icread.partial.decode('utf-8')
            continue
        else:
            responseJSON = responseJSON + responseJSONpart.decode('utf-8')
            break

    return json.loads(responseJSON)

except Exception as RESTex:
    print("Exception occurred making REST call: " + RESTex.__str__())

score 1 · Answer 4 · answered Jun 21 '15 at 16:44

You can use requests instead of urllib2. requests is based on urllib3 so it rarely have any problem. Put it in a loop to try it 3 times, and it will be much stronger. You can use it this way:

import requests      

msg = None   
for i in [1,2,3]:        
    try:  
        r = requests.get(self.crawling, timeout=30)
        msg = r.text
        if msg: break
    except Exception as e:
        sys.stderr.write('Got error when requesting URL "' + self.crawling + '": ' + str(e) + '\n')
        if i == 3 :
            sys.stderr.write('{0.filename}@{0.lineno}: Failed requesting from URL "{1}" ==> {2}\n'.                       format(inspect.getframeinfo(inspect.currentframe()), self.crawling, e))
            raise e
        time.sleep(10*(i-1))

I also added a retry loop, but any time I got an `IncompleteRead` exception, it breaks out of my while loop and won't make another attempt. Any thoughts? — Addison Klinke, Apr 02 '21 at 13:52

score 0 · Answer 5 · answered May 18 '15 at 19:08

0

I found that my virus detector/firewall was causing this problem. "Online Shield" part of AVG.

answered May 18 '15 at 19:08

nigel76

21
3

score 0 · Answer 6 · answered May 06 '20 at 23:46

python3 FYI

from urllib import request
import http.client
import os
url = 'http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brand'
try:    
    response = request.urlopen(url)                                       
    file = response.read()  
except http.client.IncompleteRead as e:
    file = e.partial
except Exception as result:
    print("Unkonw error" + str(result))
    return

#   save  file 
    with open(file_path, 'wb') as f:
         print("save -> %s " % file_path)
         f.write(file)

score -1 · Answer 7 · answered Oct 28 '15 at 17:46

-1

I tried all these solutions and none of them worked for me. Actually, what did work is instead of using urllib, I just used http.client (Python 3)

conn = http.client.HTTPConnection('www.google.com')
conn.request('GET', '/')
r1 = conn.getresponse()
page = r1.read().decode('utf-8')

This works perfectly every time, whereas with urllib it was returning an incompleteread exception every time.

answered Oct 28 '15 at 17:46

Brian

7
1

This do not works always, looks like the solution is quite old. Can you please help with the new solution for Python3 – Prem KTiw Jan 16 '18 at 11:18

score -2 · Answer 8 · answered Feb 16 '17 at 01:52

-2

I just add a more exception to pass this problem.
just like

try:
    r = requests.get(url, timeout=timeout)

except (requests.exceptions.ChunkedEncodingError, requests.ConnectionError) as e:
    logging.error("There is a error: %s" % e)

answered Feb 16 '17 at 01:52

KJoker

17
1

How to handle IncompleteRead: in python

8 Answers8

Linked