0

I am trying to check if a link is broken or not. For that I am sending an element (a link) from a list of dictionaries with a while loop and I am using urllib.request. The goal is to remove only broken links from the list. List contains links to different articles from https://jamanetwork.com/ and I want to be able to download articles that exist.

However I am getting a ConnectionResetError: [Errno 104] Connection reset by peer. I only get that error when I am trying to test links from https://jamanetwork.com/, and every page on https://jamanetwork.com/, but code seems to work fine for other websites.

My question is: am I missing something here or is it server side issue?

Here is my code (python3):

import urllib.request

i = 0
while i < (len(dicts)):
  url = dicts[i]['link']
  try:
    with urllib.request.urlopen(url) as f:
    status = f.getcode()
    i += 1
 except:
    del dicts[i]

Here is a traceback:

https://jamanetwork.com/
---------------------------------------------------------------------------

ConnectionResetError                      Traceback (most recent call last)
<ipython-input-59-8d93b45dbd14> in <module>()
     22 print(url)
     23 
---> 24 with urllib.request.urlopen("https://jamanetwork.com/") as f:
     25   status = f.getcode()
     26   print(status)

12 frames
/usr/lib/python3.6/ssl.py in read(self, len, buffer)
    629         """
    630         if buffer is not None:
--> 631             v = self._sslobj.read(len, buffer)
    632         else:
    633             v = self._sslobj.read(len)

ConnectionResetError: [Errno 104] Connection reset by peer

Any suggestions are appreciated, thanks!

gulkho
  • 13
  • 1
  • 6

1 Answers1

0

Based on this answer. You can't resolve server error. But, you can handle it.

So you can't do anything about it, it is the issue of the server. But you could use try .. except block to handle that exception:

Try this code :

import urllib.request

i = 0
while i < (len(dicts)):
    url = dicts[i]['link']
    try:
        f = urllib.request.urlopen(url)
    except:
        del dicts[i]
    else:
        with f:
            status = f.getcode()
            i += 1
Elbo Shindi Pangestu
  • 2,021
  • 2
  • 12
  • 24
  • The problem is that my list contains links to different articles from https://jamanetwork.com/ and I want to be able to download articles that exist. The code above just skips all of them, even if page exists, just because link returns ConnectionResetError – gulkho May 15 '20 at 23:43
  • can you attach the different ? – Elbo Shindi Pangestu May 16 '20 at 00:04
  • Here is an example of broken link: https://jamanetwork.com/journals/jama/fullarticle/2706122?resultClick=1 and a working one: https://jamanetwork.com/channels/health-forum/fullarticle/2760097?resultClick=1 – gulkho May 16 '20 at 00:23