7

How to get the href of the all the tag that is under the class "Subforum" in the given code?

<li class="subforum">
<a href="Link1">Link1 Text</a>
</li>
<li class="subforum">
<a href="Link2">Link2 Text</a>
</li>
<li class="subforum">
<a href="Link3">Link3 Text</a>
</li>

I have tried this code but obviously it didn't work.

Bs = BeautifulSoup(requests.get(url).text,"lxml")
Class = Bs.findAll('li', {'class': 'subforum"'})
for Sub in Class:
    print(Link.get('href'))
F Blanchet
  • 1,430
  • 3
  • 21
  • 32
Hashik
  • 191
  • 1
  • 1
  • 7

2 Answers2

15

The href belongs to a tag, not li tag, use li.a to get a tag

Document: Navigating using tag names

import bs4

html = '''<li class="subforum">
 <a href="Link1">Link1 Text</a>
 </li>
 <li class="subforum">
<a href="Link2">Link2 Text</a>
</li>
<li class="subforum">
<a href="Link3">Link3 Text</a>
</li>`<br>'''

soup = bs4.BeautifulSoup(html, 'lxml')
for li in soup.find_all(class_="subforum"):
    print(li.a.get('href'))

out:

Link1
Link2
Link3

Why use class_:

It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, class, is a reserved word in Python. Using class as a keyword argument will give you a syntax error.As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_.

宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • li.a.get,so basically this is like navigation ? – Hashik Jan 19 '17 at 15:29
  • @Hashik Donthineni yes, https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names – 宏杰李 Jan 19 '17 at 15:30
  • Can you elaborate little bit about this ?"class_" I mean we usually use class for selecting class,right? Yes Please go ahead and close it it worked like charm. – Hashik Jan 19 '17 at 15:38
3

You are almost there, you just need to find an a element for every li you've located:

Class = Bs.findAll('li', {'class': 'subforum"'})
for Sub in Class:
    print(Sub.find("a").get('href'))  # or Sub.a.get('href')

But, there is an easier way - a CSS selector:

for a in Bs.select("li.subforum a"):
    print(a.get('href'))

Here, li.subforum a would match all a elements under the li elements having subforum class attribute.

As a side note, in BeautifulSoup 4, findAll() was renamed to find_all(). And, you should follow the Python general variable naming guidelines.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195