0

I know there are several screen scraping threads on here but none of the answers quite satisfied me.

I am trying to scrape the HTML from an external web page using javascript. I am using $.ajax and everything should work fine. Here is my code:

$.ajax({
    url: "my.url/path",
    dataType: 'text',
    success: function(data) {
        var myVar = $.get(url);
        alert(myVar);
    }
});

The only problem is that it is looking for the specified url within my web server. How do I use a proxy to get to an external web page?

kevin
  • 933
  • 1
  • 12
  • 20

1 Answers1

0

Due to Cross Site Scripting restrictions, you're going to have to pass the desired URL to a page on your server that will query the URL in question from serverside, and then return the results to you. Take a look at the thread below and the incorporate that into your application and have it return the source when that page is hit by your AJAX function.

How to get the HTML source of a webpage in Ruby

Using a GET request is going to the be easiest way to transfer the URL of the page you want to fetch your server so you'll be able to call something like:

$.ajax("fetchPage.rb" + encodeURI(http://www.google.com))

Because you can't access the side in question directly from the server, you're going to have to pipe the serverside script through a proxy for the request to work, which really kind of depends on your setup. Taking a look at the Proxy class in Ruby:

http://ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/Net/HTTP.html#method-c-Proxy

Community
  • 1
  • 1
TMan
  • 1,775
  • 1
  • 14
  • 26