0

I am trying to put the html source code for any webpage in a string using Javascript. Please tell me if i can do something else to solve my problem.. I am using the following code that i found from another post

function httpGet(theUrl)
{
var xmlHttp = null;

xmlHttp = new XMLHttpRequest();
xmlHttp.open( "GET", theUrl, false );
xmlHttp.send( null );
return xmlHttp.responseText;
}

I tried this in IE Firefox and Chrome but i always get the following source code which is the source code for "PAGE NOT FOUND" page..If you any other info please let me know in a comment.. What i am trying is to get html from any webpage like google.com and other webpages..If i can't do that then what can i do?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head profile="http://gmpg.org/xfn/11">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>404 - PAGE NOT FOUND</title>
            <style type="text/css">
            body{padding:0;margin:0;font-family:helvetica;}
            #container{margin:20px auto;width:868px;}
            #container #top404{background-image:url('http://74.53.143.237/images/404top.gif');background-repeat:no-repeat;width:868px;height:168px;}
            #container #mid404{background-image:url('http://74.53.143.237/images/404mid.gif');background-repeat:repeat-y;width:868px;}
            #container #mid404 #gatorbottom{position:relative;left:39px;float:left;}
            #container #mid404 #xxx{float:left;padding:40px 237px 10px;}
            #container #mid404 #content{float:left;text-align:center;width:868px;}
            #container #mid404 #content #errorcode{font-size:30px;font-weight:800;}
            #container #mid404 #content p{font-weight:800;}
            #container #mid404 #content #banner{margin:20px 0 0 ;}
            #container #mid404 #content #hostedby{font-weight:800;font-size:25px;font-style:italic;margin:20px 0 0;}
            #container #mid404 #content #coupon{color:#AB0000;font-size:22px;font-style:italic;}
            #container #mid404 #content #getstarted a{color:#AB0000;font-size:31px;font-style:italic;font-weight:800;}
            #container #mid404 #content #getstarted {margin:0 0 35px;}
            #container #bottom404{background-image:url('http://74.53.143.237/images/404bottom.gif');background-repeat:no-repeat;width:868px;height:14px;}
            </style>
</head>
<body>
<div id="container">
    <div id="top404"></div>
    <div id="mid404">

            <div id="gatorbottom"><img src="http://74.53.143.237/images/gatorbottom.png" alt="" /></div>
            <div id="xxx"><img src="http://74.53.143.237/images/x.png" alt="" /></div>
    <div id="content">
            <div id="errorcode">ERROR 404 - PAGE NOT FOUND</div>
            <p>Oops! Looks like the page you're looking for was moved or never existed.<br />Make sure you typed the correct URL or followed a valid link.</p>

            <div id="banner">

                    <object width="728" height="90"><param name="movie" value="http://74.53.143.237/images/hg728x90.swf">

                            <embed src="http://74.53.143.237/images/hg728x90.swf?clickTAG=http://secure.hostgator.com/cgi-bin/affiliates/clickthru.cgi?id=page404" width="728" height="90"></embed>
                    </object>
            </div>

            <div id="hostedby">This site is hosted by HostGator!</div>
            <div id="coupon">Build your website today for 1 cent!   Coupon code: "404PAGE"</div>

            <div id="getstarted"><a href="http://www.hostgator.com/?utm_source=internal&utm_medium=link&utm_campaign=page404" title="HostGator Web Hosting" >CLICK HERE TO GET STARTED</a></div>

    </div>

    <div style="clear:left;"></div>
    </div>
    <div id="bottom404"></div>
</div>

</body>

</html>
Community
  • 1
  • 1
Dchris
  • 2,867
  • 10
  • 42
  • 73
  • You can't access the contents of a page on a different domain only by using client-side JavaScript. – XCS Apr 13 '13 at 16:16
  • That is because your URL is wrong. Check n fiddler or Chrome console to see what is the url that is sent and hit it in the browser. http://stackoverflow.com/questions/15534640/ajax-origin-localhost-is-not-allowed-by-access-control-allow-origin/15537999#15537999 – PSL Apr 13 '13 at 16:16

1 Answers1

3

I am trying to put the html source code for any webpage in a string using Javascript

If by "any" you mean pages from origins other than the origin your document is served from, you can't do that from JavaScript running in a browser, because you're using an ajax call and those are restricted by the Same Origin Policy, which says that (for instance) script running in a document on http://stackoverflow.com can't use ajax to load content from http://example.com. (An "origin" is more than just the domain name, there are several aspects to it, see the link for details).

Some of the pages you might request (but probably very few) might support Cross-Origin Resource Sharing, in which case if they allow your origin (probably by allowing all origins), you could use ajax to load their content.

If you're running JavaScript outside the browser (NodeJS, SilkJS, RingoJS, Rhino, Windows Scripting Host, etc.), then the SOP wouldn't apply, but I suspect you'd probably need to use something other than the XMLHttpRequest object to do it.

But fundamentally, in a web page (not an extension/add-on) in a browser, you can't do that.

...but i always get the ... source code for "PAGE NOT FOUND" page

But that sounds like the URL is just wrong.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • Yes by any i mean pages from different origins like google.com – Dchris Apr 13 '13 at 16:17
  • "The policy ... prevents access to most methods and properties across pages on different sites." - From the link provided above – OdinX Apr 13 '13 at 16:20
  • @Dchris: Then, as I said above, you can't. – T.J. Crowder Apr 13 '13 at 16:21
  • 1
    @Dchris: Cross-origin stuff is quite locked down in browsers. You can put the content in the page using an `iframe`, but you can't *access* it in code (and a large number of sites use "frame busting" to avoid being framed that way). So basically, no, not really, not purely client-side. You can, of course, have your page query **your** server and ask your server to get the target page, then send it back to you. And you can [use YQL as a cross-domain proxy](http://ajaxian.com/archives/using-yql-as-a-proxy-for-cross-domain-ajax), but I expect there are limits to what you can do there. – T.J. Crowder Apr 13 '13 at 16:35