1

I'm really new to Javascript/jQuery. I've coded in Objective-C and Swift before and there it was possible to parse a (x)html-website with XPath and a framework like Hpple.

Now I have to do something like that in JavaScript(Cloud Code from parse.com).

My problem is now, that I'd like to parse like that:

var url = "http://www.google.com";
var xpath = "//body";
someJavaScriptMagic.parse(url, xpath);

I've often seen people using the document.evaluatemethod, but there they parsed the website on which they were at the moment and not another website.

Is there a way to do that?

I dont know if it's important, but I'm using CloudCode from parse.com

EDIT:

I've already tried using the ajax-query:

$.ajax({ url: 'http://www.digitec.ch', success: function(data) { alert(data); } });

But I get the following error each time:

XMLHttpRequest cannot load http://www.digitec.ch/. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://fiddle.jshell.net' is therefore not allowed access.
Christian
  • 22,585
  • 9
  • 80
  • 106

3 Answers3

1

You can't make AJAX requests (i.e., HTTP requests in JavaScript) to a different domain from the domain that served the resource making the request. In other words, if your JavaScript is served from "foo.com/some.js", and it is attempting to fetch "google.com", it will fail. This is called the Same-origin policy, and it is a fundamental principle in web application security. Read about it here: http://en.wikipedia.org/wiki/Same-origin_policy. Googling "Access-Control-Allow-Origin" (from your error) will give you much more information about this as well.

You can work around this by making a request to a script on your own domain that serves as a proxy. For example:

foo.com/some.js

var url = "http://www.google.com";
someJavaScriptMagic.get("foo.com/fetchUrl?url="+url);

Then you had a backend script that accepts that request, and in turn makes an HTTP request to the host specified by the CGI param "url" and returns the HTML.

joeltine
  • 1,610
  • 17
  • 23
  • In dotNet framework, it's pretty easy to grab XHTML or any valid XML via a server-side script and search it using XPATH. The Html (non-XML compliant) variant is likely a little trickier. I've implemented similar in the past using the HtmlAgilityPack, which sort of implements XPATH expressions (it's attribute searches used to be odd, don't know if they ever fixed it). – Mark Rabjohn Jan 15 '15 at 17:55
0

Take a look at this thread for how to fetch the HTML from a URL.

You can use the jQuery function parseHTML to convert a string into a bunch of DOM objects, and then select elements from those DOM objects.

If you insist on using XPath then you might want to take a look at document.evaluate, or this thread.

Community
  • 1
  • 1
Jens
  • 8,423
  • 9
  • 58
  • 78
  • My problem is, that if I use this function in jsfiddle from the link of the thread, I receive the error: XMLHttpRequest cannot load https://www.digitec.ch/. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://fiddle.jshell.net' is therefore not allowed access. – Christian Jan 10 '15 at 15:46
  • From this. http://stackoverflow.com/questions/6375461/get-html-code-using-javascript-with-a-url – Christian Jan 10 '15 at 16:01
  • If you can use jQuery then the [ajax](http://api.jquery.com/jquery.ajax/) call should suffice, no? Is [digitec](https://www.digitec.ch/) the website you want to scrape? – Jens Jan 10 '15 at 16:08
  • I've tried this function: $.ajax({ url: 'http://www.digitec.ch', success: function(data) { alert(data); } }); But the error is the same. I try that in Chrome. – Christian Jan 10 '15 at 16:11
  • Quick search gives you [this answer](http://stackoverflow.com/questions/20035101/no-access-control-allow-origin-header-is-present-on-the-requested-resource). – Jens Jan 10 '15 at 16:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68555/discussion-between-c0dev-and-jens). – Christian Jan 10 '15 at 16:20
0

I think that SlimerJS will help you.

Valee
  • 71
  • 1
  • 6