Try to fix code2 to soup.find_all('h2',class_='a b')
Example:
Given are four h2
tags with its classes, soup.find_all('h2',class_='a b')
get the first of them, cause it is matching the filter.
To get the text
of the h2
element use .text
, I have done it with
[heading.text for heading in soup.find_all('h2',class_='a b')]
cause we have to loop the find_all()
result.
from bs4 import BeautifulSoup
html = """
<h2 class="a b"> Heading a and b </h2>
<h2 class="b a"> Heading b and a </h2>
<h2 class="a"> Heading a </h2>
<h2 class="b"> Heading b </h2>
"""
soup=BeautifulSoup(html,'html.parser')
[heading.text for heading in soup.find_all('h2',class_='a b')]
Output
[' Heading a and b ']
Further thoughts
You say, that it would not work for you - without providing further code/information, it is hard to help and more guessing. Let me show you what also could be a reason:
Let´s say you are scraping google results
, there are a lot of options to do that, I just wanna show two approaches requests
and selenium
.
Requests Example
Inspected classes for h3
in browser are LC20lb DKV0Md
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.google.com/search?q=stackoverflow')
soup = BeautifulSoup(r.content, 'lxml')
headingsH3Class = soup.find_all('h3', class_='LC20lb DKV0Md')
headingsH3Only = soup.find_all('h3')
print(headingsH3Class[:2])
print(headingsH3Only[:2],'\n')
Requests Example Output
_
[<h3 class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Stack Overflow</div></h3>, <h3 class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Stack Overflow (Website) – Wikipedia</div></h3>]
Selenium Example
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'https://www.google.com/search?q=stackoverflow'
browser = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'lxml')
headingsH3Class = soup.find_all('h3', class_='LC20lb DKV0Md')
headingsH3Only = soup.find_all('h3')
print(headingsH3Class[:2])
print(headingsH3Only[:2])
browser.close()
Selenium Example Output
- A List with exactly the
h3
with it´s both classes we searched for.
_
[<h3 class="LC20lb DKV0Md"><span>Stack Overflow - Where Developers Learn, Share, & Build ...</span></h3>, <h3 class="LC20lb DKV0Md"><span>Stack Overflow (Website) – Wikipedia</span></h3>]
- A list with all
h3
Elements
_
[<h3 class="LC20lb DKV0Md"><span>Stack Overflow - Where Developers Learn, Share, & Build ...</span></h3>, <h3 class="r"><a class="l" data-ved="2ahUKEwj426uv9u3tAhUPohQKHYymBMAQjBAwAXoECAcQAQ" href="https://stackoverflow.com/questions" ping="/url?sa=t&source=web&rct=j&url=https://stackoverflow.com/questions&amp;ved=2ahUKEwj426uv9u3tAhUPohQKHYymBMAQjBAwAXoECAcQAQ">Questions</a></h3>]
Conclusion
Always check the data you are scraping, cause response and inspected things in browser can be different.