1

I have an HTML inside DIV class content that looks like

<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>

My Python code is

soup.find('div',attrs={'class':'content'}).h2.text

And it returns

Brookstone
                         AS20194 Multi-functional Massage Chair

How should I update the codes so it returns

AS20194 Multi-functional Massage Chair
letsintegreat
  • 3,328
  • 4
  • 18
  • 39
PURWU
  • 397
  • 1
  • 8
  • 22
  • Does this answer your question? [Exclude unwanted tag on Beautifulsoup Python](https://stackoverflow.com/questions/40760441/exclude-unwanted-tag-on-beautifulsoup-python) – sushanth Jun 09 '20 at 16:10
  • No. The answer in that thread was to exclude strong but I was asking for to get the one that is not strong. – PURWU Jun 09 '20 at 18:44

3 Answers3

2

No need to do .extract(), you can use .find_next_sibling() with parameter text=True:

from bs4 import BeautifulSoup


txt = '''<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>'''

soup = BeautifulSoup(txt, 'html.parser')

print(soup.h2.strong.find_next_sibling(text=True))

Prints:

 AS20194 Multi-functional Massage Chair
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

Not really a master of Beautiful soup, but what I see is that it correctly returns the "text" part of the code. What you might want to try is see in the bs4 documentation if there is a way to select content that is not formatted.

Seb
  • 72
  • 5
  • The Brookstone part is inside and I would like to get rid of that in the result. Any thoughts? – PURWU Jun 09 '20 at 15:54
0

You can use extract() to ignore strong tag.you can try it:

import requests
from bs4 import BeautifulSoup
import re

html_doc="""<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>"""

soup = BeautifulSoup(html_doc, 'lxml')

for strong in soup.find("strong"):
    strong.extract()
print(soup.text)

Output will be:

AS20194 Multi-functional Massage Chair
Humayun Ahmad Rajib
  • 1,502
  • 1
  • 10
  • 22