You will need to specify what it means for a link to be broken exhaustively. Here is a sample code, you can tweak it to your need by updating the is_broken
method:
import pandas as pd
import requests
# Preparing dummy data
links = ['https://google.com', 'http://thisisinvalid.de', 'http://docs.python-requests.org/en/master/api/broken']
df = pd.DataFrame(links, columns=['links'])
# Update as you need
def is_broken(link):
try:
response = requests.get(link)
if response.status_code == 404:
return True
return False
except Exception as e:
return True
df.ix[:, 'is_broken'] = df.ix[:, 'links'].map(lambda link: is_broken(link))
https://google.com
is not broken, http://thisisinvalid.de
cannot resolve and http://docs.python-requests.org/en/master/api/broken
returns 404