1

I have a website, and I need a way to get html data from a different website via an http request, and I've looked around for ways to implement it and most say via an ajax call instead.

An ajax call is blocked by linked in so I want to try a plain cross domain http request and hope it's not blocked one way or another.

Midnight_Blaze
  • 481
  • 6
  • 29
  • 1
    That "different web site" should be configured to accept requests from your domain. – Roman Feb 19 '17 at 10:53
  • even plain HTTP requests? I know ajax can be blocked but can HTTP requests ? – Midnight_Blaze Feb 19 '17 at 10:54
  • If your domain is not allowed by that web site you can send requests but they will be rejected independently on request type, protocol. – Roman Feb 19 '17 at 10:57
  • @Midnight_Blaze how you send HTTP request w/o ajax (aka XmlHttpRequest)? – m87 Feb 19 '17 at 10:57
  • I've been lead to believe the different websites block ajax specifically and disallows any domain which is not itself to use ajax. still, it allows http requests since it is a public website. – Midnight_Blaze Feb 19 '17 at 10:58
  • there are few options within JS like CORS, JSONP, websockets or iframe sandbox - mostly you need to adjust configuration of the other site. Also you can use a backend proxy script that might get needed info via CURL, for example, because it doesn't have the same origin policy limitation. – curveball Feb 19 '17 at 11:00
  • AJAX is only necessary if you are talking about a *browser* request. If you want to initiate a request server-side from c#, you can use [WebClient or HttpWebRequest](http://stackoverflow.com/a/4988325/181087) instead. – NightOwl888 Feb 19 '17 at 11:04
  • and I would like to add that what is blocking requests from JS to another domain isn't a website itself but rather web browsers with security purposes. – curveball Feb 19 '17 at 11:05
  • I see, and if I can't send an ajax call I can't utilize a regular http request either? (I was also led to believe there was a difference between the two). – Midnight_Blaze Feb 19 '17 at 11:08
  • 1
    as I understand, you can send but the request will be blocked by the browser. There will be previous exchange of headers resulting in blocking of your request if CORS is not set up. If you have no control over the other domain and it doesn't provide you with a well-known interface like JSONP, so I think your best bet is to use a backend proxy. So, you ajax call will send request to a script on your server. And that script free of same origin policy limitation will be able to get the data you need from another domain and send it to the frontend of your site. – curveball Feb 19 '17 at 11:13

1 Answers1

1

If you have a server running and are able to run code on it, you can make the HTTP call server side. Keep in mind though that most sites only allow so many calls per IP address so you can't serve a lot of users this way.

This is a simple httpListener that downloads an websites content when the QueryString contains ?site=http://linkedin.com:

// setup an listener
using(var listener = new HttpListener())
{
    // on port 8080
    listener.Prefixes.Add("http://+:8080/");
    listener.Start();
    while(true) 
    {
        // wait for a connect
        var ctx = listener.GetContext();
        var req = ctx.Request;
        var resp = ctx.Response;
        // default page 
        var cnt = "<html><body><a href=\"/?site=http://linkedin.com\">click me</a> </body></html>";
        foreach(var key in req.QueryString.Keys)
        {
          if (key!=null) 
          {
             // if the url contains ?site=some url to an site
            switch(key.ToString()) 
            {
                case "site":
                // lets download
                var wc = new WebClient();
                // store html in cnt
                cnt = wc.DownloadString(req.QueryString[key.ToString()]);
                // when needed you can do caching or processing here
                // of the results, depending on your needs
                break;
                default:
                break;
            }
          }
        }
        // output whatever is in cnt to the calling browser
        using(var sw = new StreamWriter(resp.OutputStream))
        {
            sw.Write(cnt);
        }
    }
}

To make above code work you might have to set permissions for the url, if you'r on your development box do:

netsh http add urlacl url=http://+:8080/ user=Everyone   listen=yes

On production use sane values for the user.

Once that is set run the above code and point your browser to

 http://localhost:8080/

(notice the / at the end)

You'll get a simple page with a link on it:

click me

Clicking that link will send a new request to the httplistener but this time with the query string site=http://linkedin.com. The server side code will fetch the http content that is at the url given, in this case from LinkedIn.com. The result is send back one-on-one to the browser but you can do post-processing/caching etc, depending on your requirements.

Legal notice/disclaimer

Most sites don't like being scraped this way and their Terms of Service might actually forbid it. Make sure you don't do illegal things that either harms site reliability or leads to legal actions against you.

rene
  • 41,474
  • 78
  • 114
  • 152