0

I'm running a selenium script in Python with the Firefox browser, where the task is to scroll down until the end of the page. The process is composed by scroll down the page, click a button at the bottom which will trigger the page to load more content. This steps should repeat until there's no button anymore.

It's a slow process that takes several hours to complete and, although this script worked in some other pages, for a longer page, I'm stuck in a problem where the geckodriver.log points to the following error:

JavaScript error: line 0: uncaught exception: out of memory

What I'm not understanding is that my computer has a lot of available memory (RAM), when I check with htop only 40% of memory is used.

I have two hypothesis:

  1. The memory allocated to the firefox browser is limited, if this is true, how could I increase the amount of memory?
  2. My script is really exploding my memory, and the system kills some parts when this occurs and, when i go check the memory usage, its had been free, tricking me out.
ivarejao
  • 25
  • 8
  • If it takes hours to scroll down the page, that's already the sign that something's very broken. A long page (say, the equivalent of 200 printed A4 pages) might take a full minute to scroll down through, not an hour, and definitely not several hours. – Mike 'Pomax' Kamermans Aug 28 '23 at 16:34
  • Actually its a normal behavior to take this long, cause I need to scroll down the page, click a button and then, it will load more content. This process should repeat until there's no button anymore at the bottom of the page. If I do this manually, it would also take hours. – ivarejao Aug 28 '23 at 17:19
  • 1
    That's a very different thing from scrolling. It sounds a bit like you're "illegally" webscraping a site instead of using their RSS/API/etc. to get that data., and you're running into the fact that, yes, if you're loading more and more and more data over the course of hours, without pruning the DOM, you're just filling up RAM until the browser crashes. – Mike 'Pomax' Kamermans Aug 28 '23 at 17:46
  • 1
    sounds like a badly designed page. (though I think Facebook used to do the same...) It's probably keeping all the items in the DOM and only adding new ones. Eventually the browser is over-loaded with content and you run out of resources. Most sites will limit the current DOM to a specific number of items to avoid this. The offscreen items aren't particularly useful to users anyway...) – pcalkins Aug 28 '23 at 22:10

1 Answers1

0

As shown in the comments, this is a situation where the page is badly designed. Therefore, to avoid the out of memory problem I installed the uBlock extension on Firefox geckodriver, which resolved my problem as it blocked 25% of the page's content, mainly ads.

The implementation is explained in this stack overflow answer

ivarejao
  • 25
  • 8