how to get specific links with BeautifulSoup?

Question

I am trying to crawl HTML source with Python using BeautifulSoup.
I need to get the href of specific link <a> tags.

This is my test code. I want to get links <a href="/example/test/link/activity1~10"target="_blank">

<div class="listArea">
   <div class="activity_sticky" id="activity">
   .
   .
   </div>
   <div class="activity_content activity_loaded">
      <div class="activity-list-item activity_item__1fhpg">
         <div class="activity-list-item_activity__3FmEX">
            <div>...</div>
            <a href="/example/test/link/activity1" target="_blank">
               <div class="activity-list-item_addr">
                  <span> 0x1292311</span>
               </div>
            </a>
         </div>
      </div>
      <div class="activity-list-item activity_item__1fhpg">
         <div class="activity-list-item_activity__3FmEX">
            <div>...</div>
            <a href="/example/test/link/activity2" target="_blank">
               <div class="activity-list-item_addr">
                  <span> 0x1292312</span>
               </div>
            </a>
         </div>
      </div>
      .
      .
      .
   </div>
</div>

What have you tried? The BeautifulSoup documentation has many examples, including this situation. — Tim Roberts, Feb 21 '22 at 05:57
Does this answer your question? [retrieve links from web page using python and BeautifulSoup](https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup) — nathan liang, Feb 21 '22 at 06:02

score 2 · Answer 1 · answered Feb 21 '22 at 06:00

2

Check the main page of the bs4 documentation:

for link in soup.find_all('a'):
    print(link.get('href'))

answered Feb 21 '22 at 06:00

nathan liang

1,000
2
11
22

score 0 · Answer 2 · answered Feb 21 '22 at 06:02

This is a code for the problem. You should find the all <a></a>, then to getting the value of href.

soup = BeautifulSoup(html, 'html.parser')
for i in soup.find_all('a'):
    if i['target'] == "_blank":
        print(i['href'])

Hope my answer could help you.

HedgeHog · Answer 3 · 2022-02-21T07:20:19.987

Select the <a> specific - lternative to @Mason Ma answer you can also use css selectors:

soup.select('.activity_content a')]

or by its attribute target -

soup.select('.activity_content a[target="_blank"]')

Example

Will give you a list of links, matching your condition:

import requests
from bs4 import BeautifulSoup

html = '''
<div class="activity_content activity_loaded">
      <div class="activity-list-item activity_item__1fhpg">
         <div class="activity-list-item_activity__3FmEX">
            <div>...</div>
            <a href="/example/test/link/activity1" target="_blank">
               <div class="activity-list-item_addr">
                  <span> 0x1292311</span>
               </div>
            </a>
         </div>
      </div>
      <div class="activity-list-item activity_item__1fhpg">
         <div class="activity-list-item_activity__3FmEX">
            <div>...</div>
            <a href="/example/test/link/activity2" target="_blank">
               <div class="activity-list-item_addr">
                  <span> 0x1292312</span>
               </div>
            </a>
         </div>
      </div>
'''
soup = BeautifulSoup(html)

[x['href'] for x in soup.select('.activity_content a[target="_blank"]')]

Output

['/example/test/link/activity1', '/example/test/link/activity2']

Einstein EBEREONWU · Answer 4 · 2022-09-13T11:18:58.777

0

Based on my understanding of your question, you're trying to extract the links (href) from anchor tags where the target value is _blank. You can do this by searching for all anchor tags then narrowing down to those whose target == '_blank'

links = soup.findAll('a', attrs = {'target' : '_blank'})
for link in links:
    print(link.get('href'))

edited Sep 13 '22 at 11:18

answered Sep 12 '22 at 18:35

Einstein EBEREONWU

1
2

how to get specific links with BeautifulSoup?

4 Answers4

Example

Output