Description of the situation: It is a script that scrolls in a frame in order to extract the information.
<ul>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
...
</ul>
The list length of about 30 items, when scrolling, no new items are added <li> </li>
, only updated. The structure of the DOM does not increase.
Explaining the problem:
When the script scrolls, it must extract all the elements of the <li> </li>
for each iteration because they are renewed.
Here is the logic of scrolling and extracting elements. The code I use:
SCROLL_PAUSE_TIME = 5
# Get scroll height
last_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
all_msgs_loaded = False
while not all_msgs_loaded:
li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']")
driver.execute_script("document.querySelector('li[data-tid=\"pane-item\"]').scrollIntoView();")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
if new_height == last_height:
all_msgs_loaded = True
last_height = new_height
For each iteration li_elements receives about 30 WebElements. If i comment on the line with find_elements, the script works for hours without increasing the RAM consumption. I mention that I do not save anything in runtime, that I don't have an increase in consumption elsewhere.
Another way I used to get li_elements is through
self._driwer.execute_script ()
Example:
li_elements = (self._driver.execute_script(
"return document.querySelectorAll('li[data-tid=\"pane-item\"]');",
WebDriverWait(self._edge_driver, 20).until(
EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))
By both methods I get the same result that I have, but the RAM increase is the same. RAM grows indefinitely until TaskManager destroys the process on its own for security.
I analyzed the internal structure of these functions, but I did not find anything that could load the RAM.
Another modality would be find_elements_by_css_selector ()
, but inside it is called find_elements ()
.
I also tried different combinations with sleep (), but nothing helps, RAM does not decrease.
Can you please explain to me what is happening in reality, I do not understand why RAM consumption is increasing.
Can you tell me if there is another method of extracting the elements without consuming RAM?
` list of items `- ` is only updated from the server.
– Andrew Mar 03 '22 at 06:27List, at each scroll it is only updated from the server
– Andrew Mar 03 '22 at 06:36list of items- is only updated from the server_: Definately JavaScript/AJAX is in play.
– undetected Selenium Mar 03 '22 at 08:26