How do I extract all the links of a certain section of a web page with beautifulsoup?

Question

I need to extract only the links of a certain section in a webpage but all the tutorials I find on Beautifulsoup always scrape the whole page.

How do I scrape only the links within a certain <div class="xyz">???

EDIT: I currently have this code:

soup1.find_all('h3', class_="entry-title td-module-title")

This finds all the links of the webpage, which are contained in the class_="entry-title td-module-title"

I want to find all the links that are still contained in the class

"entry-title td-module-title"`

But only those contained in the section represented by:

<div class="wpb_wrapper">

(Sorry if my question was a bit lacking of information, I tried to add more details)

`soup.findAll("div", {"class": "xyz"})` This should work. Later you should scrape the individual links inside the section by storing this in a variable. — Suraj, Jul 11 '20 at 09:57
Does this answer your question? [How to find elements by class](https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class) — MrNobody33, Jul 11 '20 at 10:01

Abdul Rauf · Accepted Answer · 2020-07-11T10:25:08.247

1

Try this:

soup2 =  soup1.find_all('div',class_='wpb_wrapper')
results = []
for div in soup2:
    required = div.find_all('h3', class_="entry-title td-module-title")
    results.append(required)

edited Jul 11 '20 at 10:25

answered Jul 11 '20 at 10:01

Abdul Rauf

54
2

I solved it in a different way, but your answer was still very useful – 9879ypxkj Jul 12 '20 at 13:06

score 0 · Answer 2 · answered Jul 11 '20 at 10:46

You can use CSS selector for this task:

for link in soup.select('div.wpb_wrapper h3.entry-title.td-module-title a'):
    print(link['href'])

This will print all links that are under <h3 class="entry-title td-module-title"> which is under <div class="wpb_wrapper">.

How do I extract all the links of a certain section of a web page with beautifulsoup?

2 Answers2