Extract all hyperlink using Python

Question

I've been trying to extract all the hyperlink from an internal sharepoint website using beautiful soup in python but whenever I'm trying to run the program I'm getting zero results. When I checked the view source of the website it also doesn't show any hyperlink. However I can see all the link using the inspect option in the browser. is there any way I can extract all those links using python.

Code:

def main():

    r=requests.get('https://abc[.]com/query?',auth=HttpNtlmAuth(spuser,getpass.getpass()))
print(r.status_code)
soup = BeautifulSoup(r.content, "html.parser")
for link in soup.find_all('div',{'class':"list_episode"}):    
    print(link)

The above code provide no results.

Possible duplicate: https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup — Ricky, Dec 27 '21 at 18:10

score 1 · Answer 1 · answered Dec 27 '21 at 18:12

1

When I checked the view source of the website it also doesn't show any hyperlink.

The site may be using Javascript to dynamically fill in the links.

If so, you will likely need a browser to run the Javascript before you parse the links.

Selenium is a tool that you can run from Python to access the links. See: https://selenium-python.readthedocs.io

answered Dec 27 '21 at 18:12

Raymond Hettinger

216,523
63
388
485

is there any way I can do it without Selenium ? – Karan Mahajan Jan 10 '22 at 07:10

score 0 · Answer 2 · answered Dec 27 '21 at 18:11

0

from bs4 import BeautifulSoup as Soup
import requests

url = "https://stackoverflow.com"
page = requests.get(url)

soup = Soup(page, "lxml")

links = [link.get('href') for link in soup.findAll('a')]

If this does not work, then submit another question with your source code and exact error

answered Dec 27 '21 at 18:11

Kían

70
6

Instead of `findAll()` syntax better use `find_all()` – HedgeHog Dec 27 '21 at 18:16
1

Yes. `find_all()` is the newer python3 convention. – Kían Dec 27 '21 at 20:28

Extract all hyperlink using Python

2 Answers2