Regardless of how you want to approach this, you have to bypass the same origin policy. Perhaps the easiest approach to this is to just put a simple PHP script on the server to fetch a url and return it.
Depending on where you want to do the work (i.e. what language you feel comfortable in), you can take a client approach to parsing or a server approach.
CLIENT PARSING STRATEGY
(working fiddle)
If you want to do the work in jQuery, your simple PHP script will look something like this:
<?php
// you could do this with curl too, plenty of tuts on that topic
$url = $_GET['url']; //todo: sanitize this!
print file_get_contents($url);
Then you would parse the result client side like so:
jQuery(function($) {
// given an html response, extract the title
function getTitle(data) {
var matches = data.match(/<title>(.+)<\/title>/);
return matches.length > 1? matches[1] : '';
}
// find the body tag of an element
// because browsers parse the innerHtml differently
// (http://stackoverflow.com/questions/2488839/does-jquery-strip-some-html-elements-from-a-string-when-using-html)
// we can't rely on just $(data) to do this right
function getBody(data) {
var matches = data.match(/<body>(.+)<\/body>/);
return $(matches[1]);
}
//given an html respones, extract a description
function getDesc(data) {
var $data = $(data);
var $match = $data.find('meta[name=description]');
if ($match.length) {
return $match.attr('content');
}
var $body = getBody(data);
return $body.text().substring(0, 255).replace(/\n/, ' ');
}
// this url would point to a proxy (PHP) script on your server
// which would do a curl or similar operation to retrieve the
// url's contents; we just point to fiddle's simulator here
$.ajax('/php_fetch_url.php', {
data: {
url: 'http://www.somedomain.to/fetch/'
},
success: function(data, status, xhr) {
// assumes your debugger console (e.g. Firebug) is opened!
console.log(data);
console.log(status);
console.log(xhr);
console.log('title='+getTitle(data));
console.log('desc='+getDesc(data));
},
type: 'GET',
error: function(xhr, status, err) {
console.log(status);
console.log(err);
},
dataType: 'text'
});
});
SERVER PARSING STRATEGY
If you feel more comfy in PHP, or really want to take the most efficient and secure approach, then you can do the work in PHP and return a json object. Your PHP script will look something like this:
<?php
function fetchContent($url) {
//todo: sanitize $url!
return file_get_contents($url);
}
function fetchTitle($content) {
preg_match('@<title>([^<]+)</title>@m', $content, $matches);
return count($matches) > 1? $matches[1] : '';
}
function fetchBody($content) {
return preg_replace('@.*<body>(.*)</body>.*@m', "\\1", $content);
}
function fetchDesc($content) {
preg_match('@<meta[\s\n+]name=[\'"]description[\'"][\s\n]+content=[\'"]([^'"]+)[\'"]@m', $content, $matches);
if( count($matches) > 1 ) { return $matches[1]; }
$body = fetchBody($content);
}
$content = fetchContent($_GET['url']);
// you may need to install json
// http://us.php.net/json
print json_encode( array("title" => fetchTitle($content), "description" => fetchDesc($content))) );
And then your js code will look something like this:
jQuery(function($) {
$.ajax('/php_fetch_url.php', {
// A CRUCIAL CHANGE!
dataType: 'json',
data: {
url: 'http://www.somedomain.to/fetch/'
},
success: function(data, status, xhr) {
// assumes your debugger console (e.g. Firebug) is opened!
console.log('title='+data.title);
console.log('desc='+data.description);
},
type: 'GET',
error: function(xhr, status, err) {
console.log(status);
console.log(err);
}
});
});