I have a html as below
<div id="bodyContent" class="content mw-parser-output">
<div id="mw-content-text" style="direction: ltr;">
<h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
<span class="mw-headline" id="title_0">pomme</span>
</h1>
<details data-level="2" open="">
<summary class="section-heading"><h2 id="English">English</h2></summary>
<details data-level="3" open="">abc</details>
</details>
<details data-level="2" open="">
<summary class="section-heading"><h2 id="French">French</h2></summary>
<details data-level="3" open="">abc</details>
</details>
<details data-level="2" open="">
<summary class="section-heading"><h2 id="Norman">Norman</h2></summary>
<details data-level="3" open="">abc</details>
</details>
</div>
</div>
Inside each element <details data-level="2" open="">
, there is an element <h2 id="English">English</h2>
to denote the language. My goal is to delete all <details data-level="2" open="">
whose language is different from English
. My expected result is
<div id="bodyContent" class="content mw-parser-output">
<div id="mw-content-text" style="direction: ltr;">
<h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
<span class="mw-headline" id="title_0">pomme</span>
</h1>
<details data-level="2" open="">
<summary class="section-heading"><h2 id="English">English</h2></summary>
<details data-level="3" open="">abc</details>
</details>
</div>
</div>
I obtain such result by
from bs4 import BeautifulSoup
texte = """
<div id="bodyContent" class="content mw-parser-output">
<div id="mw-content-text" style="direction: ltr;">
<h1 class="section-heading" tabindex="0" aria-haspopup="true" data-section-id="0">
<span class="mw-headline" id="title_0">pomme</span>
</h1>
<details data-level="2" open="">
<summary class="section-heading"><h2 id="English">English</h2></summary>
<details data-level="3" open="">abc</details>
</details>
</div>
</div>
"""
soup = BeautifulSoup(texte, 'html.parser')
tmp = soup.select('details > summary > h2')
tmp2 = [s.contents[0] for s in tmp]
for i in range(len(tmp2)):
if tmp2[i] != 'English':
tmp[i].find_parent('details').decompose()
soup
I need to repeat this operation nearly 4 millions of times. I would like to ask of there is a more efficient way to do so. Thank you so much for your help!