There are two elements : <div class = "abc def">
and <div class = "abc">
I want to select the latter.
My code is
soup.find('div', {'class':'abc'})
However it select the former.
What is the correct way to do it?
There are two elements : <div class = "abc def">
and <div class = "abc">
I want to select the latter.
My code is
soup.find('div', {'class':'abc'})
However it select the former.
What is the correct way to do it?
The former element has two classes: and (see e.g. How to assign multiple classes to an HTML container?), so BeautifulSoup correctly points at it when using find()
.
In order to point to the second you should use findAll
- which returns a list - and extract the second element:
soup.findAll('div', {'class':'abc'})[1]
From Official doc :
You can also search for the exact string value of the class attribute:
css_soup.find_all("p", class_="body strikeout") # [<p class="body strikeout"></p>]
soup.find_all("div", class_="abc")
Try :nth-of-type(2)
or :nth-child(2)
with css selector.
print(soup.select_one('.abc:nth-of-type(2)'))
Example:
html='''<div class = "abc def"></div>
<div class = "abc"></div>'''
soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('.abc:nth-of-type(2)'))
Edited:
print(soup.select_one('.abc:not(.def)'))
To get an exact class match, you can use the following function lambda expression as filter.
soup.find_all(lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc')
You can also wrap this in a function if you want. ''.join(x.get('class', list())) == 'abc'
joins a the classes (if available) and checks if it is equal to 'abc'
.
Example
from bs4 import BeautifulSoup
html = """
<div class = "abc def"></div>
<div class = "abc"></div>
<div></div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(
soup.find_all(
lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc'
)
)
Output
[<div class="abc"></div>]
Ref: