0

I'm trying to extract some information from some HTML using python's BeautifulSoup.

Sudsection of the HTML:

<div class="ui-grid-canvas">
                            <!-- -->
                            <div class="ui-grid-row" ng-class="{'ui-grid-tree-header-row': row.treeLevel &gt; -1, 'ui-grid-row-dirty': row.isDirty, 'ui-grid-row-saving': row.isSaving, 'ui-grid-row-error': row.isError,'ui-grid-row-selected': row.isSelected}" ng-repeat="(rowRenderIndex, row) in rowContainer.renderedRows track by $index" ng-style="Viewport.rowStyle(rowRenderIndex)">
                                <div role="row" row-render-index="rowRenderIndex" ui-grid-row="row">
                                    <div role="row">
                                        <!-- -->
                                        <div class="ui-grid-cell ui-grid-coluiGrid-0005" ng-class="{sorted: col.name==$parent.$parent.$parent.$parent.$parent.$parent.$parent.datatableImpl.sortedColumn}" ng-repeat="(colRenderIndex, col) in colContainer.renderedColumns track by col.uid" role="gridcell" tabindex="0" ui-grid-cell="">
                                            <div class="ui-grid-cell-contents" ng-bind-html="row.entity[col.field].content" title="Alnwick-Haldimand">Alnwick-Haldimand</div>
                                        </div>
                                        <!-- -->
                                        <div class="ui-grid-cell ui-grid-coluiGrid-0006" ng-class="{sorted: col.name==$parent.$parent.$parent.$parent.$parent.$parent.$parent.datatableImpl.sortedColumn}" ng-repeat="(colRenderIndex, col) in colContainer.renderedColumns track by col.uid" role="gridcell" tabindex="0" ui-grid-cell="">
                                            <div class="ui-grid-cell-contents" ng-bind-html="row.entity[col.field].content" title="Alderville Community Centre">Alderville Community Centre</div>
                                        </div>
                                        <!-- -->
                                        <div class="ui-grid-cell ui-grid-coluiGrid-0007" ng-class="{sorted: col.name==$parent.$parent.$parent.$parent.$parent.$parent.$parent.datatableImpl.sortedColumn}" ng-repeat="(colRenderIndex, col) in colContainer.renderedColumns track by col.uid" role="gridcell" tabindex="0" ui-grid-cell="">
                                            <div class="ui-grid-cell-contents" ng-bind-html="row.entity[col.field].content" title="Under construction">Under construction</div>
                                        </div>
                                        <!-- -->
                                        <div class="ui-grid-cell ui-grid-coluiGrid-0008" ng-class="{sorted: col.name==$parent.$parent.$parent.$parent.$parent.$parent.$parent.datatableImpl.sortedColumn}" ng-repeat="(colRenderIndex, col) in colContainer.renderedColumns track by col.uid" role="gridcell" tabindex="0" ui-grid-cell="">
                                            <div class="ui-grid-cell-contents" ng-bind-html="row.entity[col.field].content" title="March 2018">March 2018</div>
                                        </div>
                                        <!-- -->
                                    </div>
                                </div>
                                <!-- -->
                                <!-- -->
                            </div>

I'm encountering a strange error. The following is a block of code for which the problem is occuring:

 table = page_soup.findAll('div',attrs={"class" : "ui-grid-canvas"})
 print(type(table[0]))

 rows = table[0].findAll('div',attrs={"class": "ui-grid-row"})
 print(type(rows[0]))

 cell = rows[0].findALL('div')
 print(type(cells))

These lines return the following:

 <class 'bs4.element.Tag'>
 <class 'bs4.element.Tag'>

 TypeError                                 Traceback (most recent call last)

 <ipython-input-56-13fce9e4b865> in <module>()
       5 print(type(rows[0]))  
       6 
 ----> 7 cell = rows[0].findALL('div')
       8 print(type(cells))

 TypeError: 'NoneType' object is not callable

Why is this returning a type error when the check on the variable type directly above indicates that it is a bs4.element.Tag which worked in the case of the table variable?

Using Ubuntu, Python 3.6 and BS4.

Thanks in advance.

1 Answers1

1

The error occurs because from the 2nd line you have comments (with these lines: <!-- -->), not ordinary markup elements. They're normally not caught by BeautifulSoup methods. And that's why your rows element is empty.

What you need to access comments is to use Comment object from bs4. I've answered a similar question here: Accessing commented HTML Lines with BeautifulSoup

Dmitriy Fialkovskiy
  • 3,065
  • 8
  • 32
  • 47