Answer here : How to join absolute and relative urls?
I want to check internal links with BeautifulSoup and Selenium.
Script is working when links are like this : full url path
<a href="http...." />
Script is NOT working when links are like this : partial url path
<a href="/internal_link.php" />
My python script :
soup=BeautifulSoup(r,'html5lib')
links=[]
for link in soup.findAll('a'):
set="True"
for word in exc:
if word in str(link.get('href')).lower():
set="False"
break
if set=="True":
try:
st = re.search('(\S+)', str(link.get('href')).lower())
st = st.group(0)
if site in st: # 2 SCENARIOS HERE
links.append(st)
except:
pass
CASE 1 : check all links: full path
if "http" in st:
CASE 2 : Check only internal links: (site is current page) full path
if site in st:
So, I'm looking for a way to load links even if there is not the full path of the url