I'm having a bit of a headscratcher here, where I'm working on a table with Python 3 and selenium. I am trying to extract some data from a table (tblGuid
), and get some info from a few columns.
While the data is presumably retrieved correctly (the len(rows)
prints the expected amount of rows), the iterator seems to get stuck on the first element, only printing the same socket
repeatedly, with the amount of prints matching len(rows)
vlan = "vlan14"
time.sleep(3)
# Enter filter for vlan
print("Filtered by vlan: " + vlan)
browser.find_element_by_xpath("/html/body/div[1]/div[4]/div[3]/div[4]/div/div[2]/div/div[1]/div[3]/div/table/tfoot/tr/th[13]/input").send_keys(vlan)
# Sort by socket
browser.find_element_by_xpath("/html/body/div[1]/div[4]/div[3]/div[4]/div/div[2]/div/div[1]/div[1]/div/table/thead/tr/th[14]").click()
time.sleep(2)
table = browser.find_element_by_id('tblGuid')
rows = table.find_elements_by_xpath(".//tr")
time.sleep(2)
print("Len: ", len(rows))
for row in rows:
socket = row.find_element_by_xpath('//td[10]').text
print("Socket: ", socket)
# Other stuff of the same natures as the above two lines go here. Get a different column and assign it to a variable.
browser.quit()
I am running this code with firefox and not turning on headless mode, to confirm that all clicks, sorts, and filters are applied as intended. The browser output looks as expected, and the data is all there, with socket being a number that varies between 1 and 52 over ~50 rows. It seems to me that the for
loop gets stuck on the first element of rows
.
I have added a lot of (probably redundant time.sleep()
to ensure that the page is loaded properly, and so that I can see the page being updated as the script progresses.
It is probably worth mentioning that the page I am scraping does not contain the table data in HTML, as it is populated by javascript working on a database. At first I thought this was the problem, but the fact that the data being printed as socket matches the first line of the table (as does any other columns) tells me that the data is being retrieved correctly, but I fail to iterate over it.
EDIT - A cleaned up version of the HTML
<table id="tblGuid" class="table table-striped table-hover table-condensed detailedTable table-bordered dataTable" style="width: 99.9%;" role="grid" aria-describedby="tblGuid_info">
<tbody>
<tr role="row" class="odd">
<td><button class="tableButton regguid" data-guid="0046ca">Reg.</button></td>
<td>0046ca</td>
<td>0110F17754</td>
<td>A18122</td>
<td><a href="detail?serial=37530" target="_blank">37530</a></td>
<td>05929a</td>
<td>3.0.0</td>
<td>19-12-21 19:56</td>
<td>20-01-19 19:53</td>
<td>20-01-19 19:53</td>
<td>20526661632</td>
<td>1</td>
<td>vlan14</td>
<td class="sorting_1">1</td>
<td>0</td>
<td><a data-node-error="0" data-node-guid="0046ca" href="#"> 0</a></td>
<td><a href="qc?rclId=1279" target="_blank">145811</a></td>
<td>5554</td>
<td>152263</td>
<td>Done</td>
</tr>
<tr role="row" class="even">
<td><button class="tableButton regguid" data-guid="004aa4">Reg.</button></td>
<td>004aa4</td>
<td>0110F17D8D</td>
<td>A19108</td>
<td><a href="detail?serial=37740" target="_blank">37740</a></td>
<td>05936c</td>
<td>3.0.0</td>
<td>19-12-21 20:15</td>
<td>20-01-19 19:54</td>
<td>20-01-19 19:54</td>
<td>20517699584</td>
<td>1</td>
<td>vlan14</td>
<td class="sorting_1">2</td>
<td>0</td>
<td><a data-node-error="0" data-node-guid="004aa4" href="#"> 0</a></td>
<td><a href="qc?rclId=1277" target="_blank">147011</a></td>
<td>5548</td>
<td>152311</td>
<td>Done</td>
</tr>
</tbody>
</table>
Notes on the above HTML:
- Around 40 table rows removed for readability.
- Table header and footer has been removed.
- Some data in the cells have been altered for the purpose of this post. The structure remains the same.
- this is how it appears under "inspect element" in firefox.
- The xpath referenced in the python code is based on "copy -> xpath" under inspect element.