I'm attempting to write some JavaScript code (in particular, a Chrome extension) which does the following:
- Retrieve some web page's contents via AJAX.
- Get some content from that page by locating certain elements inside of the HTML string and getting their contents.
- Do a thing with that data.
I have 1) and 3) working, but I'm having some trouble achieving step 2) in a reasonable way.
I currently have 2) implemented via jQuery(htmlString)
and then using normal jQuery selectors and etc. to extract the data I want. The problem is that jQuery actually adds the retrieved HTML to the current page, loading and executing all external resources / scripts in the process. This is obviously bad.
So I'm looking for a way to get the text and HTML in certain tags inside my HTML string without:
- Loading or executing ANY scripts or resources (images, CSS, etc.) referenced in the HTML string.
- Trying to remove external resources with regular expressions, since we all know what happens when you parse [X]HTML with regex.
I believe that I can achieve what I want using jsdom and jQuery, since jsdom has a FetchExternalResources
option which can be set to false
. However, jsdom seems to only work in NodeJS, not in the browser.
Is there any reasonable way to do this?