0

How should I go about finding the depth of this block -

<div>
    <div>
        <div>
        </div>
    </div>
</div>

In this case, it should be 3. Any clue/code will help.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140

1 Answers1

6

There are many ways to do this. I would not recommend using a regex to parse XML though.

One way is to use the HTMLParser that comes standard with Python:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.depth = 1

    def handle_starttag(self, tag, attrs):
        print 'Encountered %s at depth %d.' % (tag, self.depth)
        self.depth += 1

    def handle_endtag(self, tag):
        self.depth -= 1

if __name__ == '__main__':
    html = '''
    <div>
        <div>
            <div>
            </div>
        </div>
    </div>
    '''

    MyHTMLParser().feed(html)

Running this script produces:

Encountered div at depth 1.
Encountered div at depth 2.
Encountered div at depth 3.
Marijn van Vliet
  • 5,239
  • 2
  • 33
  • 45