How to scrape an element from a website which belongs to more than one class using BeautifulSoup

Question

I am trying to scrape elements from a website.

<h2 class="a b" data-test-search-result-header-title> Heading </h2>

How can I extract the value Heading from the website using BeautifulSoup?

I have tried the following codes :

Code 1 :

soup.find_all(h2,{'class':['a','b']})

Code 2:

soup.find_all(h2,class_='a b'})

Both the codes return an empty list. How to resolve this?

multiple class names will never get you the element. You some other way to look it up? besides class name? — Abhishek Rai, Dec 26 '20 at 21:20
In [a relevant thread](https://www.iditect.com/how-to/55015704.html), someone suggests using the CSS selector, as in `soup.select("h2.a.b")`. — niamulbengali, Dec 26 '20 at 21:21
`Code 1` gives me this element. But it gives also elements with `class="a"` and `class="b"` — furas, Dec 26 '20 at 21:33
Does this answer your question? [BeautifulSoup findAll() given multiple classes?](https://stackoverflow.com/questions/18725760/beautifulsoup-findall-given-multiple-classes) — Prayson W. Daniel, Dec 26 '20 at 21:35
@Prayson W. Daniel In that question the problem is to belong to any of the classes. In this case the element should belong to both the classes — Pravallika Myneni, Dec 26 '20 at 22:06
Yes, but the thread discussion answers both questions. Let me know if none work and I will attempt to solve your issue. — Prayson W. Daniel, Dec 27 '20 at 06:41
@PravallikaMyneni : Updated my answer and added further information - Could you provide more code / a minimal functional example, please. — HedgeHog, Dec 27 '20 at 10:43

HedgeHog · Answer 1 · 2020-12-27T11:19:12.407

Try to fix code2 to soup.find_all('h2',class_='a b')

Example:

Given are four h2 tags with its classes, soup.find_all('h2',class_='a b') get the first of them, cause it is matching the filter.

To get the text of the h2 element use .text, I have done it with

[heading.text for heading in soup.find_all('h2',class_='a b')]

cause we have to loop the find_all() result.

from bs4 import BeautifulSoup

html = """
<h2 class="a b"> Heading a and b </h2>
<h2 class="b a"> Heading b and a </h2>
<h2 class="a"> Heading a </h2>
<h2 class="b"> Heading b </h2>
"""

soup=BeautifulSoup(html,'html.parser')

[heading.text for heading in soup.find_all('h2',class_='a b')]

Output

[' Heading a and b ']

Further thoughts

You say, that it would not work for you - without providing further code/information, it is hard to help and more guessing. Let me show you what also could be a reason:

Let´s say you are scraping google results, there are a lot of options to do that, I just wanna show two approaches requests and selenium.

Requests Example

Inspected classes for h3 in browser are LC20lb DKV0Md

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.google.com/search?q=stackoverflow')
soup = BeautifulSoup(r.content, 'lxml')
headingsH3Class = soup.find_all('h3', class_='LC20lb DKV0Md')
headingsH3Only = soup.find_all('h3')

print(headingsH3Class[:2])
print(headingsH3Only[:2],'\n')

Requests Example Output

An empty list

[]
A list that show us that the inspected classes are not in the page content, we get back by requests

_

[<h3 class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Stack Overflow</div></h3>, <h3 class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Stack Overflow (Website) – Wikipedia</div></h3>]

Selenium Example

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'https://www.google.com/search?q=stackoverflow'

browser = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get(url)

soup = BeautifulSoup(browser.page_source, 'lxml')
headingsH3Class = soup.find_all('h3', class_='LC20lb DKV0Md')
headingsH3Only = soup.find_all('h3')

print(headingsH3Class[:2])
print(headingsH3Only[:2])
browser.close()

Selenium Example Output

A List with exactly the h3 with it´s both classes we searched for.

_

[<h3 class="LC20lb DKV0Md"><span>Stack Overflow - Where Developers Learn, Share, &amp; Build ...</span></h3>, <h3 class="LC20lb DKV0Md"><span>Stack Overflow (Website) – Wikipedia</span></h3>]

A list with all h3 Elements

_

[<h3 class="LC20lb DKV0Md"><span>Stack Overflow - Where Developers Learn, Share, &amp; Build ...</span></h3>, <h3 class="r"><a class="l" data-ved="2ahUKEwj426uv9u3tAhUPohQKHYymBMAQjBAwAXoECAcQAQ" href="https://stackoverflow.com/questions" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://stackoverflow.com/questions&amp;amp;ved=2ahUKEwj426uv9u3tAhUPohQKHYymBMAQjBAwAXoECAcQAQ">Questions</a></h3>]

Conclusion

Always check the data you are scraping, cause response and inspected things in browser can be different.

Could you provide an url of the website or a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), please. — HedgeHog, Dec 26 '20 at 22:13
Had a good day ;) - It is something Q&A often do not contain in detail and that is a pity, but thx a lot that you saw and appreciated it. — HedgeHog, Dec 27 '20 at 18:23

How to scrape an element from a website which belongs to more than one class using BeautifulSoup

1 Answers1