0

I have a list of links that I'm trying to scrape the html text from. It's a long list (the list is titled annoying) and I appear to have some faulty links in my list. I'd like my code to ignore those links that produce an error and continue on down my list. I'm new to this, so any help is appreciated.

I attempted to use this answer catch specific HTTP error in python but I'm stuck on how to make my code move on to the next item in the list.

Here is my current code

maybe1=[]

from bs4 import BeautifulSoup
import urllib.request
import urllib

try:
    for i in annoying:
        resp=urllib.request.urlopen(i)
        soup=BeautifulSoup(resp, 'lxml').encode('utf-8')

        maybe1.append(soup)

except urllib.error.HTTPError as err:
    skip=True

Thanks much!

kaci155
  • 11
  • 3

1 Answers1

1

Just put try/except inside the loop

from bs4 import BeautifulSoup
import urllib.request
import urllib

annoying_links = ['link1', 'link2']
maybe1 = []
for link in annoying_links:
    try:
        resp=urllib.request.urlopen(i)
        soup=BeautifulSoup(resp, 'lxml').encode('utf-8')
        maybe1.append(soup)
    except urllib.error.HTTPError:
        print ('Skipped: ' + link)
grapes
  • 8,185
  • 1
  • 19
  • 31