2

I'm current building a js chrome extension, and to do so, I need to scrap data from some sites.

So, based in this SO question I found that I could achieve that using request with Browserify.

I installed both using npm and created a browserify.js snippet to create my bundle.js file(because for permissions reasons running terminal commands is not working), so I can run Node js require's in the client, my browser.

Ok, so I finally managed to create the bundle.js file and tried to run it in my local server, but it keeps giving me the CORS error and don't return a desired response:

Fetch API cannot load https://somesite/index.html. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8080' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

One strange thing is that if I run the "unbundled" file direct from terminal using node:

$ node myFileWithRequires.js

It works as intended, returned the scrapped data.

What am I doing wrong? How can I scrap data in client using request and browserify ?

CODE:

myBrowserifySnippet.js

var browserify = require('browserify');
var b = browserify();
b.add('myrequest.js');
const fs = require('fs');
const writable = fs.createWriteStream('bundle.js');
b.bundle().pipe(writable);

myFileWithRequires.js

var request = require('request');
request('http://www.google.com', function (error, response, body) {
    console.log('error:', error); // Print the error if one occurred
    console.log('statusCode:', response && response.statusCode); // Print the response status code if a response was received
        console.log('body:', body); // Print the HTML for the Google homepage.
      });
Lioo
  • 375
  • 6
  • 20
  • You don't want to use `request` on the client. `request` is too heavy (> 1MB). Instead `xhr` package implements a subset of request API, using `XMLHttpRequest`. Designed to be used with browserify. – Mehdi Sep 13 '17 at 14:32
  • 1
    Your problem is most probably related to CORS settings in `somesite`, not request. `somesite` does not allow request coming from another site, and browsers respect that. – Mehdi Sep 13 '17 at 14:34
  • @MehdiElFadil , so how can I scrap the data without this `request` solution? Do you have any idea? – Lioo Sep 13 '17 at 14:43
  • 1
    In the browser, you can't scrape other websites unless the site owner is willing to whitelist your domain. Or you can use a CORS proxy, so that the actual request happens on your server side (not restricted by CORS settings), like [cors-anywhere](https://github.com/Rob--W/cors-anywhere/#documentation) for example. e; nvm, you're inside a chrome extension, then it's different. One moment! – goto-bus-stop Sep 13 '17 at 14:52

1 Answers1

3

By default, XHR and fetch requests are bound by CORS, which means they cannot access resources on other domains unless those domains whitelist the 'origin' (the current page's domain). request in the browser uses XHR, so it's also bound by CORS.

In Chrome extensions, it's a bit different--you can configure your extension so that CORS doesn't apply to some domains. See Requiesting cross-origin permissions in the chrome extension documentation.

You need to add a permissions field to your extension manifest.json:

{
  "permissions": [
    "http://www.google.com/"
  ]
}

If you're not sure beforehand which domain you'll be scraping, you can use a wildcard:

{
  "permissions": [
    "http://*/",
    "https://*/"
  ]
}
goto-bus-stop
  • 11,655
  • 2
  • 24
  • 31
  • I wasn't already testing in the extension, but I will try and return to tell the results. Thanks – Lioo Sep 13 '17 at 15:07