1

Here is the begging of source code:

<div id="users_table" class="security_slick_container slickgrid_300610 ui-widget" style="overflow: hidden; outline: 0px; position: relative;">
    <div tabindex="0" hidefocus="" style="position:fixed;width:0;height:0;top:0;left:0;outline:0;"></div>
    <div class="slick-header ui-state-default" style="overflow:hidden;position:relative;">
        <div class="slick-header-columns" style="left: -1000px; width: 2132px;" unselectable="on">
            <div class="ui-state-default slick-header-column slick-header-sortable slick-header-column-sorted" id="slickgrid_300610userName" title="" style="width: 94px;"><span class="slick-column-name"><strong>Username:</strong></span><span class="slick-sort-indicator slick-sort-indicator-asc"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610firstName" title="" style="width: 89px;"><span class="slick-column-name"><strong>Firstname:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610lastName" title="" style="width: 109px;"><span class="slick-column-name"><strong>Lastname:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610type" title="" style="width: 124px;"><span class="slick-column-name"><strong>Type:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610crew" title="" style="width: 109px;"><span class="slick-column-name"><strong>Crew:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610jobTitle" title="" style="width: 109px;"><span class="slick-column-name"><strong>Job title:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column slick-header-sortable" id="slickgrid_300610defaultPriceClass" title="" style="width: 124px;"><span class="slick-column-name"><strong>Defaultprice class:</strong></span><span class="slick-sort-indicator"></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column" id="slickgrid_300610description" title="" style="width: 129px;"><span class="slick-column-name"><strong>Description:</strong></span>
                <div class="slick-resizable-handle"></div>
            </div>
            <div class="ui-state-default slick-header-column" id="slickgrid_300610language" title="" style="width: 39px;"><span class="slick-column-name"><strong>Language:</strong></span>
                <div class="slick-resizable-handle"></div>
            </div>
        </div>
    </div>
    <div class="slick-headerrow ui-state-default" style="overflow: hidden; position: relative; display: none;">
        <div class="slick-headerrow-columns" style="width: 1115px;"></div>
        <div style="display: block; height: 1px; position: absolute; top: 0px; left: 0px; width: 1132px;"></div>
    </div>
    <div class="slick-top-panel-scroller ui-state-default" style="overflow: hidden; position: relative; display: none;">
        <div class="slick-top-panel" style="width:10000px"></div>
    </div>
    <div class="slick-viewport" style="width: 100%; overflow: auto; outline: 0px; position: relative; height: 567px;">
        <div class="grid-canvas" style="height: 16550px; width: 1115px;">
            <div class="ui-widget-content slick-row even" style="top:0px">
                <div class="slick-cell l0 r0">john.smith</div>
                <div class="slick-cell l1 r1">John</div>
                <div class="slick-cell l2 r2">Smith</div>
                <div class="slick-cell l3 r3">Contractor</div>
                <div class="slick-cell l4 r4">Microsoft</div>
                <div class="slick-cell l5 r5">Sales manager</div>
                <div class="slick-cell l6 r6">A</div>
                <div class="slick-cell l7 r7"></div>
                <div class="slick-cell l8 r8">en</div>
            </div>
            <div class="ui-widget-content slick-row odd" style="top:25px">
                <div class="slick-cell l0 r0">robert.geits</div>
                <div class="slick-cell l1 r1">Robert</div>
                <div class="slick-cell l2 r2">Geits</div>
                <div class="slick-cell l3 r3">Staff</div>
                <div class="slick-cell l4 r4">Google</div>
                <div class="slick-cell l5 r5">Project manager</div>
                <div class="slick-cell l6 r6">B</div>
                <div class="slick-cell l7 r7"></div>
                <div class="slick-cell l8 r8">de</div>
            </div>
            <div class="ui-widget-content slick-row even" style="top:50px">
                <div class="slick-cell l0 r0">amir.rooney</div>
                <div class="slick-cell l1 r1">Amir</div>
                <div class="slick-cell l2 r2">Rooney</div>
                <div class="slick-cell l3 r3">Staff</div>
                <div class="slick-cell l4 r4">Microsoft</div>
                <div class="slick-cell l5 r5">Sales manager</div>
                <div class="slick-cell l6 r6">A</div>
                <div class="slick-cell l7 r7"></div>
                <div class="slick-cell l8 r8">en</div>
            </div>
        </div>
    </div>
    <div tabindex="0" hidefocus="" style="position:fixed;width:0;height:0;top:0;left:0;outline:0;"></div>
</div>

I am using following code for parsing the table. Any ideas why only part of table is parsed? There are a lot of records but current code is able to handle names starting with A and partly with B, but then it stops without giving any errors?

from selenium import webdriver

for tr in driver.find_elements_by_xpath('//*[@id=\"users_table\"]'):
    tds = tr.find_elements_by_tag_name('div')
    print ([td.text for td in tds])

I use selenium because I need to perform login before parsing.

10101
  • 2,232
  • 3
  • 26
  • 66
  • Are you waiting for the page to fully load before doing the find_elements? - you said yourself it's a big page with a lot of records. If these are delivered by scripts than selenium could be running and getting a partial result before it's complete. Try a fixed wait to see if this increase your number of hits, if that works, implement a webdriverwait on an object to make sure the page is ready - happy to help more - just say if the fixed wait works and if you need more info :-) – RichEdwards Jul 30 '20 at 11:29
  • @RichEdwards I just went for more detailed investigations and noticed that table gets loaded with values on a scroll. So on a scroll it's values are changing in source code. Is there any chance to parse such a case? Haven't seen such a beauty before. Page is fully loaded – 10101 Jul 30 '20 at 11:30
  • literally helped someone this morning with a scroll issue - but you'll need a way to scroll, then get your tr's if you can share a link i can have a look at something better than someone elses question: -> https://stackoverflow.com/questions/63168429/web-scraping-python-fails-to-load-the-url-on-button-click/63168989#63168989 – RichEdwards Jul 30 '20 at 11:48
  • Before jumping into the conclusion, can you help me with the information related to the number of rows it's been showing `print len(driver.find_elements_by_css_selector('div.slick-row'))`. By this way we can determine if it's a dynamic loading table or something else. – supputuri Jul 30 '20 at 12:55

1 Answers1

0

Try with the below logic.

# get information from all rows
for tr in driver.find_elements_by_css_selector('div.slick-row'):
    # get the columns in the row
    tds = tr.find_elements_by_tag_name("div")
    # print all td informatoin
    print ([td.text for td in tds])

supputuri
  • 13,644
  • 2
  • 21
  • 39