Executing scraped JavaScript with cheerio

Question

I have a web page in which there are some JS APIs that don't alter the dom, but return some numbers. I'd like to write a NodeJS application that downloads such pages and executes those functions in the context of the downloaded page.

I was looking at cheerio for page scraping.. but while I see how easy is it to navigate and manipulate the DOM with it, I don't see any access to running the page functions. Is it possible to do it?

Should I look, instead, at jsdom?

[this](http://stackoverflow.com/a/7978072/2172543) is the best SO answer I've found so far about your question. It's not strictly about executing web pages javascript, is about HTML parsing. — Marcel, Mar 24 '13 at 17:43

score 5 · Answer 1 · answered Aug 28 '13 at 01:52

5

Sounds like you want to use PhantomJS, which will provide the fully rendered output, and then use cheerio on that.

answered Aug 28 '13 at 01:52

Mark Selby

597
5
7

2

These days you want Puppeteer. – pguardiario May 16 '19 at 03:11

score 0 · Answer 2 · answered Feb 22 '13 at 18:27

0

Cheerio and jsdom are both HTML scrapers and have no notion of executing JavaScript. If the API you wish to access is written in JavaScript, there is little to prevent you from extracting them and running them inside node. Beware though, downloading/executing arbitrary JavaScript can pose a huge security risk. If you want to simulate the behaviour of a browser, look at http://phantomjs.org/. This is a headless browser for Node and can do everything an ordinary browser can as well.

answered Feb 22 '13 at 18:27

Deathspike

8,582
6
44
82

1

Note that if you do want to run JS safely in Node, it's perfectly doable via the `vm` module that has a `runInContext` method that is completely isolated from the rest of your code (but can still hog resources). – Benjamin Gruenbaum May 11 '14 at 20:32
4

jsdom **is not** just an HTML scraper with no notion of executing JavaScript. See the docs: [Initialization lifecycle](https://github.com/tmpvar/jsdom/blob/master/README.md#initialization-lifecycle) and [For the hardcore: jsdom.jsdom](https://github.com/tmpvar/jsdom/blob/master/README.md#for-the-hardcore-jsdomjsdom) – rsp Jul 30 '14 at 23:39

Executing scraped JavaScript with cheerio

2 Answers2

Linked

Related