How i identify a list of website that doesn't exist, from a bunch of website?

Question

Goodafternoon, for a university python project I need to estract a table from a website, but the link doesn't exist, so i need that my cycle ignore that link, and move to the next link. how can I do that?

i'm using the python language to create a dataset of soundtrack. I used BeautifulSoup to extract the .html, but the link docent exist, so i think about putting a

if type(link)=="NoneType":

but it doesn't work. link is the result of soup.find that gave me as a result nothing, infant type(link) give me as a result NoneType. what can i do to recognise the inexistent link? thank you for the help

Does this answer your question? [How to "test" NoneType in python?](https://stackoverflow.com/questions/23086383/how-to-test-nonetype-in-python) TL;DR: use `if link is None:` — Brian61354270, Jan 16 '23 at 00:15
Selcuk, ScottC gave me a solution, but do you want to see the code anyway? — Lukesky, Jan 16 '23 at 00:41

score 0 · Accepted Answer · answered Jan 16 '23 at 00:34

You can create a function to test if the URL is valid. If it generates an error, then it will return False, however if is creates a successful connection, it will return True. You can then use this function to filter your list to produce a new list of valid URLS.

Here is an example:

Code:

import requests

url_list = ["http://yahoo.com", "http://a_random_site_that_does_not_exist.com", "http://google.com"]

def is_valid_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
        return True
    except requests.exceptions.RequestException:
        return False

valid_url_list = list(filter(is_valid_url, url_list))
print(valid_url_list)

Output:

['http://yahoo.com', 'http://google.com']

How i identify a list of website that doesn't exist, from a bunch of website?

1 Answers1

Code:

Output: