How to check if a soup contains an element?

Question

I have an html. I would like to check if it contains at least one English section. This is signified by

<summary class="section-heading"><h2 id="English">English</h2></summary>

This operation is performed millions of times. To be efficient, I want checking process stops right after it meets the first of such elements. I tried a method from here. Could you please elaborate on

why soup.find('details[data-level="2"]:has(h2#English)') did not work? On the other hand, soup.select_one('details[data-level="2"]:has(h2#English)') works perfectly.
how to solve it?

from bs4 import BeautifulSoup

texte = """
<div id="bodyContent" class="content mw-parser-output">
    <div id="mw-content-text" style="direction: ltr;">
        <h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
            <span class="mw-headline" id="title_0">pomme</span>
        </h1>     
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="English">English</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="French">French</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
    </div>
</div>
"""

soup = BeautifulSoup(texte, 'html.parser')

if soup.find('details[data-level="2"]:has(h2#English)'):  
    print('found')
else:
    print('not found')

@GiorgiImerlishvili There are millions of such html. To be stable, I would impose a strict condition `details[data-level="2"]:has(h2#English)` rather than just With id. — Akira, Apr 22 '21 at 07:54
Instead of `find`, use `select` [`see this`](https://stackoverflow.com/a/38033910/4985099) — sushanth, Apr 22 '21 at 08:00

score 1 · Answer 1 · answered Apr 22 '21 at 07:58

You can use find_all and then search what you wish for:

from bs4 import BeautifulSoup

texte = """
<div id="bodyContent" class="content mw-parser-output">
    <div id="mw-content-text" style="direction: ltr;">
        <h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
            <span class="mw-headline" id="title_0">pomme</span>
        </h1>     
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="English">English</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="French">French</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
    </div>
</div>
"""

soup = BeautifulSoup(texte, 'html.parser')
details = soup.find_all("details", {"data-level": "2"})
lang = "English"
for detail in details:
    detail_str = str(detail)
    if lang in detail_str:
        print(detail)

Outputs:

<details data-level="2" open="">
<summary class="section-heading"><h2 id="English">English</h2></summary>
<details data-level="3" open="">abc</details>
</details>

score 1 · Answer 2 · answered Apr 22 '21 at 08:06

As BeautifulSoup doesn't have xpath support, we can use lxml alternatively.

from lxml import html
texte = """
<div id="bodyContent" class="content mw-parser-output">
    <div id="mw-content-text" style="direction: ltr;">
        <h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
            <span class="mw-headline" id="title_0">pomme</span>
        </h1>     
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="English">English</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
        <details data-level="2" open="">
            <summary class="section-heading"><h2 id="French">French</h2></summary>
            <details data-level="3" open="">abc</details>
        </details>
    </div>
</div>
"""
tree = html.fromstring(texte)
element = tree.xpath('//details[@data-level="2"]//h2[contains(text(),"English")]')
if element:
    print("Found")
else:
    print("Not found")

score 1 · Accepted Answer · answered Apr 22 '21 at 08:07

1

You can try select_one instead of find. Something like this.

soup.select_one('details[data-level="2"] summary.section-heading h2#English')

The result will be

<h2 id="English">English</h2>

answered Apr 22 '21 at 08:07

Antony Phoenix

71
1
5

And to answer "why find not working like this `soup.find('details[data-level="2"]:has(h2#English)')` ". Basically you can't search by CSS-selectors with it. [Here more info](https://beautiful-soup-4.readthedocs.io/en/latest/#searching-the-tree) – Antony Phoenix Apr 22 '21 at 09:12

How to check if a soup contains an element?

3 Answers3

Linked