I have the following:
html =
'''<div class=“file-one”>
<a href=“/file-one/additional” class=“file-link">
<h3 class=“file-name”>File One</h3>
</a>
<div class=“location”>
Down
</div>
</div>'''
And would like to get just the text of href
which is /file-one/additional
. So I did:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
link_text = “”
for a in soup.find_all(‘a’, href=True, text=True):
link_text = a[‘href’]
print “Link: “ + link_text
But it just prints a blank, nothing. Just Link:
. So I tested it out on another site but with a different HTML, and it worked.
What could I be doing wrong? Or is there a possibility that the site intentionally programmed to not return the href
?
Thank you in advance and will be sure to upvote/accept answer!