Test if children tag exists in beautifulsoup

Question

i have an XML file with an defined structure but different number of tags, like

file1.xml:

<document>
  <subDoc>
    <id>1</id>
    <myId>1</myId>
  </subDoc>
</document>

file2.xml:

<document>
  <subDoc>
    <id>2</id>
  </subDoc>
</document>

Now i like to check, if the tag myId exits. So i did the following:

data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)

hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)

The result is for file1.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType ->   <class 'bs4.element.Tag'>

file2.xml:

hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>

Okay, <myId> is not an attribute of <subdoc>.

But how i can test, if an sub-tag exists?

//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.

score 42 · Answer 1 · answered Jan 12 '16 at 10:31

42

if tag.find('child_tag_name'):

answered Jan 12 '16 at 10:31

ahuigo

2,929
2
25
45

wpercy · Accepted Answer · 2020-05-07T18:12:07.160

16

The simplest way to find if a child tag exists is simply

childTag = xml.find('childTag')
if childTag:
    # do stuff

More specifically to OP's question:

If you don't know the structure of the XML doc, you can use the .find() method of the soup. Something like this:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.find("myId")
    hasAttrBs2 = xml2.find("myId")

If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:

with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
    xml = BeautifulSoup(data.read())
    xml2 = BeautifulSoup(data2.read())

    hasAttrBs = xml.document.subdoc.myid
    hasAttrBs2 = xml2.document.subdoc.myid
    print hasAttrBs
    print hasAttrBs2

Prints

<myid>1</myid>
None

edited May 07 '20 at 18:12

answered Oct 21 '15 at 15:09

wpercy

9,636
4
33
45

2

...but `find()` searches trough the document, right? But, I know the position of the tag insight the xml tree (if it exists). So is there no easy way to directly address an element or check if that element exists? – The Bndr Oct 29 '15 at 15:00
Oh okay, I'm sorry I misunderstood the first time. I've updated my answer. – wpercy Oct 29 '15 at 15:38
Oh, I see.... "Keep it simple" is sometimes the best way. Thank you for open my eyes... – The Bndr Nov 02 '15 at 13:25

score 4 · Answer 3 · edited Jan 23 '19 at 00:33

4

Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:

import datetime
import urllib
import requests
from bs4 import BeautifulSoup

instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")

if not soup.find('h2'):
    print("didn't find h2")

edited Jan 23 '19 at 00:33

GustavoIP

873
2
8
25

answered Oct 12 '16 at 01:07

Mona Jalal

34,860
64
239
408

This line right here " if not soup.find('h2'):" just saved me tons of headaches. I didn't know about this. Thank you! – M4cJunk13 Jan 06 '18 at 01:28
within bs4 tags, use `has_attr(key)` instead, like `alt_image_text = [tag["alt"] for tag in images if tag.has_attr("alt")]`. Note that tag.src always seems to return None. – Marc Maxmeister Nov 06 '18 at 19:32

score 1 · Answer 4 · answered Oct 20 '15 at 13:57

1

you can handle it like this:

for child in xml.document.subdoc.children:
    if 'myId' == child.name:
       return True

answered Oct 20 '15 at 13:57

chyoo CHENG

720
2
9
22

Thank you. But: The think is, that I'm don't really like to iterate trough the whole subdoc, because these are large docs and I have to walk trough thousands of xml files. I hope to find an way where I can direct address/ask that element. – The Bndr Oct 20 '15 at 14:02

score 1 · Answer 5 · answered Oct 23 '19 at 08:58

1

You can do it with if tag.myID:

If you want to check if myID is the direct child not child of child use if tag.find("myID", recursive=False):

If you want to check if tag has no child, use if tag.find(True):

answered Oct 23 '19 at 08:58

LF00

27,015
29
156
295

user2458922 · Answer 6 · 2019-10-30T20:41:09.743

page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
soup = BeautifulSoup(page.content, 'html.parser')
testNode = list(soup.children)[1]

def hasChild(node):
    print(type(node))
    try:
        node.children
        return True
    except:
        return False

 if( hasChild(testNode) ):
     firstChild=list(testNode.children)[0]
     if( hasChild(firstChild) ):
        print('I found Grand Child ')

score 0 · Answer 7 · answered Oct 03 '21 at 06:26

0

if you are using a CSS selector

content = soup_elm.select('.css_selector')
if len(content) == 0:
    return None

answered Oct 03 '21 at 06:26

XY L

25,431
14
84
143

score 0 · Answer 8 · edited Apr 14 '22 at 13:09

You could also try it this way :

response = requests.get("Your URL here")
soup = BeautifulSoup(response.text,'lxml')
RESULT = soup.select_one('CSS_SELECTOR_HERE') # for one element search 
print(RESULT)

Note that the CSS Selector for Bs4 is a little different to other selector methods. Click Here for documentation on how to use CSS selectors.

soup.select works for an all element selection and works for elements with attributes as well.

Test if children tag exists in beautifulsoup

8 Answers8

Linked