2

There are two elements : <div class = "abc def"> and <div class = "abc">

I want to select the latter.

My code is

soup.find('div', {'class':'abc'})

However it select the former.

What is the correct way to do it?

jhelphenstine
  • 435
  • 2
  • 10
Chan
  • 3,605
  • 9
  • 29
  • 60

4 Answers4

0

The former element has two classes: and (see e.g. How to assign multiple classes to an HTML container?), so BeautifulSoup correctly points at it when using find().

In order to point to the second you should use findAll - which returns a list - and extract the second element:

soup.findAll('div', {'class':'abc'})[1]
cap.py
  • 249
  • 1
  • 8
0

From Official doc :

You can also search for the exact string value of the class attribute:

css_soup.find_all("p", class_="body strikeout")
# [<p class="body strikeout"></p>]
soup.find_all("div", class_="abc")
pymym213
  • 321
  • 3
  • 10
0

Try :nth-of-type(2) or :nth-child(2) with css selector.

print(soup.select_one('.abc:nth-of-type(2)'))

Example:

html='''<div class = "abc def"></div>
        <div class = "abc"></div>'''

soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('.abc:nth-of-type(2)'))

Edited:

print(soup.select_one('.abc:not(.def)'))
KunduK
  • 32,888
  • 5
  • 17
  • 41
0

To get an exact class match, you can use the following function lambda expression as filter.

 soup.find_all(lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc')

You can also wrap this in a function if you want. ''.join(x.get('class', list())) == 'abc' joins a the classes (if available) and checks if it is equal to 'abc'.

Example

from bs4 import BeautifulSoup
html = """
<div class = "abc def"></div>
<div class = "abc"></div>
<div></div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(
    soup.find_all(
        lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc'
    )
)

Output

[<div class="abc"></div>]

Ref:

Bitto
  • 7,937
  • 1
  • 16
  • 38