I am trying to download web pages using python selenium.
There is a tree view on the left side and the content on the right side.
This is HTML of treeview. Of course, all sub menus are closed at first.
<ul>
<li>
<a href="#" onclick="openSubMenu()">item1</a>
<ul>
<li>
<a href="./item2.html">item2</a>
</li>
<li>
<a href="#" onclick="openSubMenu()">item3</a>
<ul>
<li>
<a href="./item4.html">item4</a>
</li>
<li>
<a href="#" onclick="openSubMenu()">item5</a>
<ul>
<li>
<a href="./item6.html">item6</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#" onclick="openSubMenu()">item7</a>
<ul>
<li>
<a href="./item8.html">item8</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#" onclick="openSubMenu()">item9</a>
<ul>
<li>
<a href="./item10.html">item10</a>
</li>
</ul>
</li>
<li>
<a href="#" onclick="openSubMenu()">item11</a>
<ul>
<li>
<a href="./item11.html">item12</a>
</li>
</ul>
</li>
</ul>
When I click an item, if it has a page link, it is linked to the right's iframe
tag, if not, opens the sub-menu.
I used tree recursion to open all sub-menus.
def tree_recursion(self, tree_container):
tree_branches = tree_container.find_elements(By.XPATH, './li')
for tree_branch in tree_branches:
time.sleep(0.5)
tree_branch.find_element(By.XPATH, './a').click()
try:
new_tree = tree_branch.find_element(By.XPATH, './ul')
if new_tree:
tree_recursion(new_tree)
except:
continue
But it didn't work, Following error occurred.
File "...\run.py", line 105, in tree_recursion
tree_branch.find_element(By.XPATH, './a').click()
File "...\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 433, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]
File "...\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 410, in _execute
return self._parent.execute(command, params)
File "...\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 444, in execute
self.error_handler.check_response(response)
File "...\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 249, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=109.0.5414.75)
Stacktrace:
Backtrace:
(No symbol) [0x00B66643]
(No symbol) [0x00AFBE21]
(No symbol) [0x009FDA9D]
(No symbol) [0x00A009E4]
(No symbol) [0x00A008AD]
(No symbol) [0x00A00B30]
(No symbol) [0x00A30FAC]
(No symbol) [0x00A3147B]
(No symbol) [0x00A264C1]
(No symbol) [0x00A4FDC4]
(No symbol) [0x00A2641F]
(No symbol) [0x00A500D4]
(No symbol) [0x00A66B09]
(No symbol) [0x00A4FB76]
(No symbol) [0x00A249C1]
(No symbol) [0x00A25E5D]
GetHandleVerifier [0x00DDA142+2497106]
GetHandleVerifier [0x00E085D3+2686691]
GetHandleVerifier [0x00E0BB9C+2700460]
GetHandleVerifier [0x00C13B10+635936]
(No symbol) [0x00B04A1F]
(No symbol) [0x00B0A418]
(No symbol) [0x00B0A505]
(No symbol) [0x00B1508B]
BaseThreadInitThunk [0x7607FA29+25]
RtlGetAppContainerNamedObjectPath [0x777D7A9E+286]
RtlGetAppContainerNamedObjectPath [0x777D7A6E+238]
I've tried to solve this problem, but I didn't find any solution for it because it needs dynamic selector in three recursion function.
What is the best solution for the dynamic selector?
Or any other way to scrap this?