23

My aim is to extract the html from all the links in the first page after entering the google search term. I work behind a proxy so this is my approach.

1.I first used mechanize to enter the search term in the form , ive set the proxies and robots correctly.

2.After extracting the links , Ive used an opener using urllib2.ProxyHandler globally , to open the urls individually.

However this gives me this error. Not able to figure it out.

urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
Cœur
  • 37,241
  • 25
  • 195
  • 267
Manoj
  • 961
  • 4
  • 11
  • 37

2 Answers2

21

Instead of copying and editing Python library modules, you can monkey-patch ssl.wrap_socket() in the ssl module by overriding the ssl_version keyword parameter. The following code can be used as-is. Put this at the start of your program before making any requests.

import ssl
from functools import wraps
def sslwrap(func):
    @wraps(func)
    def bar(*args, **kw):
        kw['ssl_version'] = ssl.PROTOCOL_TLSv1
        return func(*args, **kw)
    return bar

ssl.wrap_socket = sslwrap(ssl.wrap_socket)
chnrxn
  • 1,349
  • 1
  • 16
  • 17
4

Its a known bug, how ever some solutions for it are mentioned in the comments of this link. See them , May be helpful to you, bug url.

NIlesh Sharma
  • 5,445
  • 6
  • 36
  • 53
  • Thank you, NIlesh. I found [this](https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/965371/comments/9) to be quite helpful, despite the fact that it might not be the best solution to just abandon TLS2. – Nick Merrill Feb 03 '13 at 08:13