I am trying to process the visible text of very large pages and, as an example, the whole of Orwell's "1984" on this page, but it seems my Chrome console is crashing when I try the following operation.
var script = document.createElement('script');
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);
var allWords = $(document.body).children(":visible").text().split(' ');
var uniqueWords = allWords.filter(function(elem, i, array){ return array.indexOf(elem) === i });
The above makes my Chrome tab become unresponsive at the last operation (I stop getting output for new commands I enter for at least a minute). Note: the first part of the snippet just attaches JQuery to the page.
How would you try to process large strings like this much, much faster? Do you think I should randomly sample from allWords
and only apply the filter function to this smaller string.