Python BeautifulSoup only select top tag

Question

I encounter a problem, it might be very easy, but I didn't saw it on document.

Here is the target html structure, very simple.

<h3>Top 
    <em>Mid</em>
    <span>Down</span>
</h3>

I want to get the "Top" text which was inside the h3 tag, and I wrote this

from bs4 import BeautifulSoup
html ="<h3>Top <em>Mid </em><span>Down</span></h3>"
soup = BeautifulSoup(html)
print soup.select("h3")[0].text

But it will return Top Mid Down, how do I modify it?

Padraic Cunningham · Accepted Answer · 2016-07-25T10:55:39.143

1

You can use find setting text=True and recursive=False:

In [2]: from bs4 import BeautifulSoup
   ...: html ="<h3>Top <em>Mid </em><span>Down</span></h3>"
   ...: soup = BeautifulSoup(html,"html.parser")
   ...: print(soup.find("h3").find(text=True,recursive=False))
   ...: 
Top

Depending on the format, there are lots of different ways:

print(soup.find("h3").contents[0])
print(next(soup.find("h3").children))
print(soup.find("h3").next)

edited Jul 25 '16 at 10:55

answered Jul 25 '16 at 10:48

Padraic Cunningham

176,452
29
245
321

Thanks, I will checkout more detail about `contents` and `children` – rj487 Jul 25 '16 at 11:34

score 0 · Answer 2 · edited May 23 '17 at 12:14

0

Try something like this:

from bs4 import BeautifulSoup
html ="<h3>Top <em>Mid </em><span>Down</span></h3>"
soup = BeautifulSoup(html)
print soup.select("h3").findChildren()[0]

Though I am not entirely sure. Check this as well - How to find children of nodes using Beautiful Soup

Basically you need to hunt the first childNode.

edited May 23 '17 at 12:14

Community

1
1

answered Jul 25 '16 at 10:21

kawadhiya21

2,458
21
34

There is syntax error in your code, but thanks for your information. – rj487 Jul 25 '16 at 11:38

score -1 · Answer 3 · answered Jul 25 '16 at 10:34

-1

its easy for you to search using a regex something like this

 pageid=re.search('<h3>(.*?)</h3>', curPage, re.DOTALL)

and get the each of the data inside the tag using pageid.group(value) method

answered Jul 25 '16 at 10:34

Midhun Mohan

552
5
18

Thanks, but I thought there would be an easier way to get the content in BeautifulSoup. – rj487 Jul 25 '16 at 11:36

Python BeautifulSoup only select top tag

3 Answers3