1

I have a small piece of code that continuously clicks a button called "See Older Messages" every 500 ms, in order to load infinitely-scrolled content from a webpage. Reasons for doing this are personal, but needless to say, I'm trying to automate something which would take me weeks of non-stop scrolling to do otherwise.

The problem is that the 500 ms delay gradually begins to drop as the script runs over time. After so many hours, it can take 5 seconds or more. I'm assuming this problem is caused by Facebook throttling my requests after so long, so to prevent this, I want to make the script run for an amount of time - say 2 minutes - followed by a delay of maybe 20 secs before it runs again for 2 mins, and so on. How would I go about doing this? I've racked my brains, but my limited knowledge of JavaScript hasn't come up with anything meaningful.

Below is the current code in its entirety.

setInterval(function () {
document.getElementById('see_older').getElementsByClassName('content')[0].click();
}, 500);

Thanks a lot in advance.

Hashim Aziz
  • 4,074
  • 5
  • 38
  • 68
  • This could simply be an issue of your browser losing the attention of your operating system by becoming idle. In that case, there would be no way to speed this up from JavaScript. – Travis J Oct 08 '15 at 19:44
  • Could also be a memory issue if all the previous content on the page is still on the page while you're scrolling down – arcyqwerty Oct 08 '15 at 19:45
  • One possible option would be to use an iframe with a youtube video in it and leave the browser tab focused while playing that video and running the script in the background. This could potentially force the OS to retain resources for that browser tab. – Travis J Oct 08 '15 at 19:46
  • @arcyqwerty That is the case, and last I ran the script it was running at 1.7 million K of memory. Would this solution not solve or even mitigate the problem? – Hashim Aziz Oct 08 '15 at 21:14
  • 1.7 million K is 1.7 GB. Chances are your browser can't handle that much data quickly. Since it hasn't crashed already I'm assuming you're not using IE. You might be better off just extracting the content you want in to files and saving it to disk but that's a different question. – Caleb Mauer Oct 08 '15 at 22:21
  • @CalebMauer - Would the solution of implementing a delay not make much of a difference then? What I'm trying to do is make visible all Facebook messages from a given user - there are just over 100,000 in the thread. The only way to do that short of scrolling up manually is using this method. I'm not sure what you mean by extracting content, but I can't extract content that's not been loaded yet - once all messages in the thread have been loaded using the script, it'll be a simple matter of saving the webpage. And yeah, I'm using Chrome - I'm more surprised my computer hasn't crashed tbh. – Hashim Aziz Oct 08 '15 at 22:56
  • No, I don't think this delay will help. I just added a recommendation to my answer to use the **Graph API** instead. That is probably your best bet. Or you can just get a really powerful computer and wait a long time for the page to finish loading. – Caleb Mauer Oct 08 '15 at 23:06
  • Yeah, depending on what you're trying to do, you can scrape a lot of data off Graph API (access to which they do ratelimit you). You may not be able to get everything you can normally see since there are some limitations but it's a good start to get the most common items. If you find that you do need to scrape directly, a way to reduce memory usage is to use JS to delete the passed items from the DOM (i.e. delete the previous page of data when you load the next). – arcyqwerty Oct 08 '15 at 23:51
  • In my case, because I need to have all of the messages loaded at once, deleting previous data isn't an option. According to another forum, that's why the current script is so slow after a while, because I'm having to load so many nodes in a single session, which, of course, ups the memory. – Hashim Aziz Oct 09 '15 at 19:46

2 Answers2

2
  1. Keep track of when the script running started
  2. While it's been less than 2 mins, keep clicking every 500ms.
  3. After running for ~2 mins, stop and queue next run in 20s.
  4. Go to step 2.

-

var lastChange;

function doClick() {
  if (new Date() - lastChange < 120000 /* 2 mins */) {
    document.getElementById('see_older').getElementsByClassName('content')[0].click();
    setTimeout(doClick, 500);
  } else setTimeout(runScript, 20000 /* 20s */);
}

(function runScript() {
  lastChange = new Date();
  doClick();
})();

-

I recommend using setTimeout over setInterval since, if the browser takes a while to execute, loses focus and stops executing JS, gets paged out, etc., then you will still get the time spacing between events that you want. See https://stackoverflow.com/a/731625/1059070.

Community
  • 1
  • 1
arcyqwerty
  • 10,325
  • 4
  • 47
  • 84
1

Toggle whether or not your function does anything by setting another timer.

/* When true do load else don't. */
window.doLoad = true

setInterval(function () {
    if window.doLoad {
        document.getElementById('see_older').getElementsByClassName('content')[0].click();
    }
}, 500);

/* This will toggle doLoad every two minutes. */
setInterval(function () {
    if (window.onLoad == true) { 
        window.doLoad = false;
    } else { window.doLoad = true; }
}, 120000); // two minutes of milliseconds

In your case though you might be better off using the Facebook Graph API.

Community
  • 1
  • 1
Caleb Mauer
  • 662
  • 6
  • 11
  • What's doLoad? I can't find a reference to it anywhere. – Hashim Aziz Oct 08 '15 at 20:04
  • doLoad is a new variable that is added to the window object by this code. It's not documented, I just made it up. – Caleb Mauer Oct 08 '15 at 22:17
  • This might be a newbie question, but where's it being defined - in the first line? Don't all variables have to be defined with `var`? – Hashim Aziz Oct 08 '15 at 22:59
  • 1
    Doesn't have to be defined with `var` if you're attaching it to an object like this. That's the power of JavaScript (also part of what makes it dangerous). `Var` is for when it's not obvious if you're referencing a global or trying to create a new local variable. – Caleb Mauer Oct 08 '15 at 23:02
  • As for the Graph API, I have a feeling dealing with APIs at this moment in time with my basic knowledge of pure JS would be too difficult to deal with, especially seeing as how most of what's described in that question goes over my head, and I don't really have the time to learn just yet seeing as how quickly I need these messages. Tbh, the current method - while crude, and the JS equivalent of brute-forcing - seems to works fine for the most part; I've reached most of the way through the thread before something unexpected throws me off, so it just feels like I need to iron out the kinks. – Hashim Aziz Oct 08 '15 at 23:16
  • 1
    OK, good luck. Sometimes you just have to get it done, can't spend forever trying to get it perfect. – Caleb Mauer Oct 08 '15 at 23:20