The find_elements () function in Selenium consumes a lot of RAM

Question

Description of the situation: It is a script that scrolls in a frame in order to extract the information.

<ul>

<li> </li>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
...
</ul>

The list length of about 30 items, when scrolling, no new items are added <li> </li>, only updated. The structure of the DOM does not increase.

Explaining the problem: When the script scrolls, it must extract all the elements of the <li> </li> for each iteration because they are renewed.

Here is the logic of scrolling and extracting elements. The code I use:

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")

all_msgs_loaded = False

while not all_msgs_loaded:

    li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']")

    driver.execute_script("document.querySelector('li[data-tid=\"pane-item\"]').scrollIntoView();")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
    if new_height == last_height:
        all_msgs_loaded = True
    last_height = new_height

For each iteration li_elements receives about 30 WebElements. If i comment on the line with find_elements, the script works for hours without increasing the RAM consumption. I mention that I do not save anything in runtime, that I don't have an increase in consumption elsewhere.

Another way I used to get li_elements is through self._driwer.execute_script ()

Example:

li_elements = (self._driver.execute_script(
                 "return document.querySelectorAll('li[data-tid=\"pane-item\"]');",
                 WebDriverWait(self._edge_driver, 20).until(
                     EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))

By both methods I get the same result that I have, but the RAM increase is the same. RAM grows indefinitely until TaskManager destroys the process on its own for security.

I analyzed the internal structure of these functions, but I did not find anything that could load the RAM. Another modality would be find_elements_by_css_selector (), but inside it is called find_elements ().

I also tried different combinations with sleep (), but nothing helps, RAM does not decrease.

Can you please explain to me what is happening in reality, I do not understand why RAM consumption is increasing.

Can you tell me if there is another method of extracting the elements without consuming RAM?

score 0 · Answer 1 · answered Mar 02 '22 at 23:40

0

By any means find_elements() method of Selenium shouldn't be consuming so much of RAM. Most possibly it's the Browsing Context e.g. google-chrome which consumes more RAM while you scrollIntoView() incase the <li> items gets updated through JavaScript or AJAX.

Without any visibility in the DOM Tree it would be difficult to predict the actual reason or any remediation. However, a similar discussion suggests to use some waits interms of time.sleep(n)

answered Mar 02 '22 at 23:40

undetected Selenium

183,867
41
278
352

Hello @undetected Selenium, thanks for the reply. In my case I use Microsoft Edge, if I comment on the line with `find_elements ()`, `scrollIntoView ()` works, but the RAM consumption does not increase at all, because the DOM structure does not increase, it is static. The `
- ` is only updated from the server.
– Andrew Mar 03 '22 at 06:27
https://prnt.sc/l7fj8PFQsLOM , here is a screen from the
– Andrew Mar 03 '22 at 06:36
_
- is only updated from the server_: Definately JavaScript/AJAX is in play.
– undetected Selenium Mar 03 '22 at 08:26
Yes, you are right. But without `find_elements ()`, scrolling doesn't affect RAM at all. – Andrew Mar 03 '22 at 08:53

score 0 · Accepted Answer · answered Mar 03 '22 at 02:07

0

Try getting just what you need instead of the full element:

lis = driver.execute_script("""
  return [...document.querySelectorAll('li[data-tid="pane-item"]')].map(li => li.innerText)
""")

I can't tell what you're doing with them, but if you're adding elements to a big array, and there's enough of them, you will hit a RAM limit

answered Mar 03 '22 at 02:07

pguardiario

53,827
19
119
159

Hello @pguardiario, thanks for the reply. I tried a similar combination. `li_elements = self._driver.execute_script("return [...document.querySelectorAll('li[" "data-tid=\"chat-pane-item\"] div[" "class*=\"ui-chat__item__message\"]')].map(div => " "div.innerHTML);")` – Andrew Mar 03 '22 at 08:48
Now I get the same result, but the RAM grows much slower, and much less than it was at the same time. At the same datetime the RAM difference is about 600-700 MB. Very strange how that changed things, but it's much better. – Andrew Mar 03 '22 at 08:49
I need innerHTML here, because the internal structure is large and then I process it with BeautifulSoup4 – Andrew Mar 03 '22 at 08:52
Yeah, that sounds right. If I could see the html I could guess what you need and make a suggestion that's more efficient than parsing html with beautiful soup – pguardiario Mar 03 '22 at 10:55
Also, you could just make sure that your object is getting garbage collected inside your while loop, which it should be, but I think you didn't share the full code. – pguardiario Mar 03 '22 at 11:02
Hello @pguardiario, I've done a lot of testing, but RAM is loading slowly. Here is the structure: https://prnt.sc/LDmA-U3Y2b_X Each element has a large structure, and everything is needed – Andrew Mar 11 '22 at 19:12
Can garbage collector release li_elements at runtime? I found another option in the browser "Use hardware acceleration (if available)". This significantly reduced the consumption of RAM. – Andrew Mar 11 '22 at 19:17
It won't get garbage collected if there's a reference to it in scope – pguardiario Mar 12 '22 at 00:16

The find_elements () function in Selenium consumes a lot of RAM

2 Answers2