0

I'm trying to get a web page with following code so that I can scrape its data, but I keep getting the error: XMLHttpRequest cannot load https://websiteURL.com. Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:4200' is therefore not allowed access. I have read that I need to set the 'Access-Control-Allow-Origin' name in the header, I have tried using a wildcard '*' as well as my localhost, nothing works.

Here is my typescript code:

import { Component, OnInit } from '@angular/core';
import { Http, Response, RequestOptionsArgs, Headers } from '@angular/http';

....

constructor(private http: Http) { }

....

doScrape() {
    var header : Headers = new Headers();
    header.append('Access-Control-Allow-Origin', 'http://localhost:4200');
    var args : RequestOptionsArgs = {
      method: "GET",
      headers: header
    }

    console.log('Getting html...');
    this.http.get(this.b, args).subscribe(res => {
      console.log(res);
      this.htmlString = res.text();
    })
}

Why isn't this getting the job done?

Kevin
  • 827
  • 1
  • 7
  • 18

1 Answers1

0

Access-Control-Allow-Origin is a response header, not a request header. See this answer for details: https://stackoverflow.com/a/10636765/1759462

I found this blog post quite helpful: https://medium.freecodecamp.org/client-side-web-scraping-with-javascript-using-jquery-and-regex-5b57a271cb86

The bad news is, you need to run these sorts of requests server-side to get around this issue.

[...]

The good news is, thanks to lots of other wonderful developers that have run into the same issues, you don’t have to touch the back end yourself.

Staying firmly within our front end script, we can use cross-domain tools such as Any Origin, Whatever Origin, All Origins, crossorigin and probably a lot more. I have found that you often need to test a few of these to find the one that will work on the site you are trying to scrape.

Some of the links are dead or unmaintained so don't rely on them for production purposes. Maybe run it on your own server. For now, All Origins seems like a good choice.

tobihagemann
  • 601
  • 8
  • 15