How to get links to all articles on a website?

Question

HI I am really new to BS4 or selenium. I was wondering if there is a way to get links to all articles on a website.

For instance https://uk.yahoo.com will have many news articles. how can I (or is it possible to) get the list of links to all those articles?

Does this answer your question? [retrieve links from web page using python and BeautifulSoup](https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup) — mindless-overflow, Jan 21 '20 at 00:50

score 1 · Answer 1 · answered Jan 21 '20 at 00:55

Try this. Add your own user agent string.

import re
import requests
from bs4 import BeautifulSoup

response = requests.get(url='https://uk.yahoo.com ', headers={'User-Agent':''})
soup = BeatifulSoup(response.content, 'html.parse')

links = []
for link in soup.findAll('a', attrs={'href': re.compile('^https://')}
    links.append(link.get('href'))
print(links)

How to get links to all articles on a website?

1 Answers1