0

Is there a way to parse html content using javascript?

I have a requirement to display only a div from some other site into my site. Is that possible? For example consider I want to show only div#leftcolumn of w3schools.com in my site. Is this even possible?

How can I do the same using javascript or jQuery?

Thanks.

Control Freak
  • 12,965
  • 30
  • 94
  • 145
Harshdeep
  • 5,614
  • 10
  • 37
  • 45
  • ever tried to `.load()` that content into a hidden div and then work with your selector eg `$('div#leftcolumn', $hiddendiv)`? –  Jul 04 '12 at 18:23
  • 6
    You couldn't because of XSS protection – bksi Jul 04 '12 at 18:24
  • btw, what have you tried? show us some code ... etc pp –  Jul 04 '12 at 18:24
  • @bski that is not true - see my comment :) –  Jul 04 '12 at 18:24
  • 1
    @AndreasNiedermair yes it's true :x – Esailija Jul 04 '12 at 18:24
  • 1
    Have a read about the [same origin policy](https://developer.mozilla.org/En/Same_origin_policy_for_JavaScript). – James Allardice Jul 04 '12 at 18:25
  • @JamesAllardice thanks for pointing this out! nevertheless you can use a proxy on your domain - so it is possible! –  Jul 04 '12 at 18:27
  • @AndreasNiedermair I am trying something like `$(document).ready(function(){ var tempDiv = $("
    "); tempDiv.load("http://community.adobe.com/help/search.html?q=indesign.htm", function() { var content = tempDiv.find("#keymatch"); }); });`
    – Harshdeep Jul 04 '12 at 18:36
  • @Harshdeep you did not read the answers here ... each of them has a critical passage in it: Same-origin-policy! think about this, and you can answer your question on your own! –  Jul 04 '12 at 18:39

5 Answers5

3

You need to have a look at Same Origin Policy:

In computing, the same origin policy is an important security concept for a number of browser-side programming languages, such as JavaScript. The policy permits scripts running on pages originating from the same site to access each other's methods and properties with no specific restrictions, but prevents access to most methods and properties across pages on different sites.

For you to be able to get data, it has to be:

Same protocol and host

You need to implement JSONP to workaround it.


Though on same protocol and host, jQuery has load() function which you would use like this:

$('#foo').load('somepage.html div#leftcolumn', function(){
  // loaded
}); 

Another possible solution (untested) would be to use server-side language and you don't need jsonp. Here is an example with PHP.

1) Create a php page named ajax.php and put following code in it:

<?php
  $content = file_get_contents("http://w3schools.com");
  echo $content ? $content : '0'; 
?>

2) On some page, put this code:

$('#yourDiv').load('ajax.php div#leftcolumn', function(data){
    if (data !== '0') { /* loaded */ }
}); 

Make sure that:

  • you specify correct path to ajax.php file
  • you have allow_url_fopen turned on from php.ini.
  • your replace yourDiv with id of element you want to put the received content in
Blaster
  • 9,414
  • 1
  • 29
  • 25
  • shouldn't `echo $content ? $content '0';` rather be `echo $content ? $content : '0';`? –  Jul 04 '12 at 18:44
  • @Blaster can you please tell that what should be the content of the method? I want to set the content of this div as content of the other div say #someOtherDiv. – Harshdeep Jul 04 '12 at 18:44
  • @Harshdeep: I have already posted the code for it in my answer. Read the last line of bullet :) – Blaster Jul 04 '12 at 18:47
  • @Blaster How do I make the content of `div#leftColumn` as the content of my div. Should innerHTML be used for the same? – Harshdeep Jul 04 '12 at 18:52
  • @Harshdeep: `$('#yourDIV_ID').load('somepage.html div#leftcolumn', function({ /* content loaded */ });` as I said in my answer :) – Blaster Jul 04 '12 at 18:56
  • @Blaster It is throwing `Call to undefined function file_get_content() ` I have checked php.ini file `allow_url-fopen` is set to on. What is this new problem :( – Harshdeep Jul 04 '12 at 19:04
  • 1
    @Harshdeep: Sorry it is `file_get_contents` not `file_get_content`, I was missing last `s` :) – Blaster Jul 04 '12 at 19:09
  • @Blaster It is only returning `[object Object]`. – Harshdeep Jul 04 '12 at 19:16
  • 1
    @Harshdeep: Well that's a different issue and should be different question so that you also get attention from other people. Post a new question with all relevant code to see how you are doing it. This question is answered which means you can not retrieve contents of other domains due to Same Origin Policy without using `jsonp` – Blaster Jul 04 '12 at 19:18
2

You will need to grab the HTML content with an HTTPRequest, then you can scrape the contents of the HTML you wish to show in your page. You would need to know some sort of server side language for this, unfortunately Ajax/jQuery will not work for this due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.

Control Freak
  • 12,965
  • 30
  • 94
  • 145
  • this! you could still use jquery to load that scraped page as an ajax request :) (but from your own server, I must add to reduce confusion) – mindandmedia Jul 04 '12 at 18:26
  • 1
    due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol. – Control Freak Jul 04 '12 at 18:27
  • @ZeeTee I know bit of PHP can you guide me how to accomplish the same in PHP and Thanks for the info :) – Harshdeep Jul 04 '12 at 18:38
  • I don't know php too well, but try this: http://www.php.net/manual/en/class.httprequest.php – Control Freak Jul 04 '12 at 18:39
0

what i can think of:

<div style="hidden" id="container"></div>

and then do sth like (shortcut @ https://stackoverflow.com/a/11333936/57508)

var $container = $('#container');
$container.load('someurl-on-your-domain');
var $leftcolumn = $('div#leftcolumn', $container);
$leftcolumn.appendTo($sthother);

according to a comment: yes it is true, there's a same-origin policy (http://api.jquery.com/load/):

Due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.

So why not create a proxy which is in your domain and then use the output of the proxy?! Hey, it's long-winded - true ... but it works :)

Community
  • 1
  • 1
  • @downvoter could you explain the downvote? anything wrong with my answer? so that i can improve? ... –  Jul 04 '12 at 18:43
0

You would need to make a webservice to pull the code in. This is because you cannot pull the data in via JavaScript due to security restrictions. This is known as same origin policy and is linked elsewhere in this page.

You could use HtmlAgilityPack to parse it on the server side if you're working with asp.net technologies.

You would then be able to call the data from jQuery using .load():

The idea being you load it into a hidden div such as:

$("#result").load("/webservice/pulldata.ashx");

and query it like you would any normal jquery element.

Community
  • 1
  • 1
rtpHarry
  • 13,019
  • 4
  • 43
  • 64
  • i would let do jQuery the job, because it is more agile - see, if you need any other selector you'd have to recompile your asp.net-project. btw there are many plain web-proxies out there, so no need to sharpen the blade on both sides! –  Jul 04 '12 at 18:34
0

If you want to bypass XSS protection you can write your own server request and get info from it. Example (php):

getContent.php

<? $fileContent = file_get_content("http://w3schools.com");
   echo $fileContent; ?>

Then you can use whatever you want to modify this content (even before echo).

sample client script:

<div id="resultHtml"></div>
<script type="text/javascript">
$(document).ready(function(){
    $("#resultHtml").load("getFilecontent.php");
});
bksi
  • 1,606
  • 1
  • 23
  • 45
  • How do I proceed further? What next to get the div#leftColumn from this fileContent. Thanks :) – Harshdeep Jul 04 '12 at 18:40
  • 1
    @Harshdeep with another selector, eg `$('div#leftcolumn', $('#resultHtml'));` - c'mon ... read the other answers! each one is supplying you with a selector :) –  Jul 04 '12 at 18:41