4

I want to create a script that retrieves the HTML from several pages on a website and parse that DOM content to extract data I want.

The reason I want to do this with JavaScript is that I want to use JQuery's sizzle engine to easily parse the DOM to retrieve the information.

However with the Cross-domain policies most browsers have, I haven't found a solution yet. I stumbled across JSONP, but since that site doesn't explicitly support it, I can't use that approach.

I also thought about using IFRAMES, but Jquery doesn't seem to be able to retrieve the content either...

So my question really comes down to : Is there a way to get the DOM of a remote web page using javascript/ajax/jquery? Are there libraries that allow this?

dominicbri7
  • 2,479
  • 4
  • 23
  • 33
  • 2
    You'll want to use something like PHP to grab the source for your first. – MrHunter Aug 12 '14 at 20:56
  • i highly recommend a simple userscript; greasemonkey or tampermonkey. php will get you far, but it's DOM blows compared to jQuery (it's as bad as HTML's), and it can't handle dynamic data or templating. so at that point you have to start learning stuff like node.js-based fake browsers, when the whole time a few lines of plain browser js from a userscript would have done everything you ever wanted in a few lines of code. – dandavis Aug 12 '14 at 20:58
  • @MrHunter I guess so, I could put all the content of each page in a hidden DIV and then parse each of them using javascript on the client side with JQUERY. But I am still wondering if there is any way to make this work using JS/JQuery only – dominicbri7 Aug 12 '14 at 20:58
  • @dandavis Yeah this is why I want to do the PARSING on the client side because of how powerful JQUERY's selector/parser engine is, but I wouldn't care if PHP was the one getting the content in the first place (even though I want to know if there is a way using JS!). – dominicbri7 Aug 12 '14 at 21:00
  • 1
    @dominicbri7: well, just use one of the "monkey" extensions i mentioned to seemlessly side-step the normal origin limits. in short, you can run your code on their site, with your normal signin credentials, and since it's on thier site, it's in the same origin as the script and you're golden. without browser extensions, you can also use YQL as server to fetch html from other sites to a domain you control. you can also use a bookmarklet as a one-shot userscript. – dandavis Aug 12 '14 at 21:01

1 Answers1

2

No there is no way of read data from cross domains through client script Unless they allow it.

You should be looking for a solution to read the the data on the server side and then you may use it on the client side as you want.

Raab
  • 34,778
  • 4
  • 50
  • 65